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Preface 


As did the previous editions, this textbook presents a comprehensive account of 
sampling theory as it has been developed for use in sample surveys. It contains 
illustrations to show how the theory is applied in practice, and exercises to be 
worked by the student. The book will be useful both as a text for a course on 
sample surveys in which the major emphasis is on theory and for individual 
reading by the student. 

The minimum mathematical equipment necessary to follow the great bulk of 
the material is a familiarity with algebra, especially relatively complicated algeb- 
raic expressions, plus a knowledge of probability for finite sample spaces, includ- 
ing combinatorial probabilities. The book presupposes an introductory statistics 
course that covers means and standard deviations, the normal, binomial, 
hypergeometric, and multinomial distributions, the central limit theorem, linear 
regression, and the simpler types of analyses of variance. Since much of classical 
sample survey theory deals with the distributions of estimators over the set of 
randomizations provided by the sampling plan, some knowledge of nonparamet- 
ric methods is helpful. 

The topics in this edition are presented in essentially the same order as in earlier 
editions. New sections have been included, or sections rewritten, primarily for one 
of three reasons: (1) to present introductions to topics (sampling plans or methods 
of estimation) relatively new in the field; (2) to cover further work done during the 
last 15 years on older methods, intended either to improve them or to learn more 
about the performance of rival methods; and (3) to shorten, clarify, or simplify 
proofs given in previous editions. 

New topics in this edition include the approximate methods developed for the 
difficult problem of attaching standard errors or confidence limits to nonlinear 
estimates made from the results of surveys with complex plans. These methods 
will be more and more needed as statistical analyses (e.g., regressions) are 
performed on the results. For surveys containing sensitive questions that some 
respondents are unlikely to be willing to answer truthfully, a new device is to 
present the respondent with either the sensitive question or an innocuous ques- 
tion; the specific choice, made by randomization, is unknown to the interviewer, 
In some sampling problems it may seem economically attractive, or essential in 
countries without full sampling resources, to use two overlapping lists (or frames, 
as they are called) to cover the complete population. The method of double 
sampling has been extended to cases where the objective is to compare the means 
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of a number of subgroups within the population. There has been interesting 
work on the attractive properties that the ratio and regression estimators have if it 
can be assumed that the finite population is itself arandom sample from an infinite 
superpopulation in which a mathematical model appropriate to the ratio or 
regression estimator holds. This kind of assumption is not new—I noticed 
recently that Laplace used it around 1800 ina sampling. proble:n—but it clarifies 
the relation between sample survey theory and standard statistical theory. 

An example of further work on topics included in previous editions is Chapter 

9A, which has been written partly from material Previously in Chapter 9; this was 
done mainly to give a more adequate account of what seem to me the principal 
methods produced for sampling with unequal probabilities without replacement. 
These include the similar methods given independently bv Brewer, J. N. K. Rao, 
and Durbin, Murthy’s method, the Rao, Hartley, Cochran method, and Madow’s 
method related to systematic sampling, with comparisons of the performances of 
the methods on natural Populations. New studies have been done of the sizes of 
components of errors of measurement in surveys by repeat measurements by 
different interviewers, by interpenetrating subsamples, and by a combination of 
the two approaches. For the ratio estimator, data from natural populations have 
been used to appraise the small-sample biases in the standard large-sample 
formulas for the variance and the estimated variance. Attempts have also been 
made to create less biased variants of the ratio estimator itself and of the formula 
for estimating its sampling variance, In stratified sampling there has been addi- 
tional work on allocating sample sizes to strata when more than one item is of 
importance and on estimating sample errors when only one unit is to be selected 
per stratum. Some new systematic sampling methods for handling populations 
having linear trends are also of interest. 
Alva L. Finkner and Emil H. Jebe prepared a large part of the lecture notes 
from which the first edition of this book was written. Some investigations that 
provided background material were supported by the Office of Naval Research, 
Navy Department. From discussions of recent developments in sampling or 
Suggestions about this edition, I have been greatly helped by Tore Dalenius, 
David J. Finney, Daniel G. Horvitz, Leslie Kish, P. S. R. Sambasiva Rao, Martin 
Sandelius, Joseph Sedransk, Amode R. Sen, and especially Jon N. K. Rao, whose 
painstaking Teading of the new and revised sections of this edition resulted in 
Many constructive suggestions about gaps, weaknesses, obscurities, and selection 
of topics. For typing and other work involved in production of a typescript I am 
indebted to Rowena Foss, Holly Grano, and Edith Klotz. My thanks to all. 


William G. Cochran 
South Orleans, Massachusetts 
February, 1977 
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CHAPTER 1 


Introduction 


l.i ADVANTAGES OF THE SAMPLING METHOD 


institution in the course of several years. Travelers who spend 10 days ina foreign 
country and then proceed to write a book telling the inhabitants how to revive 
their industries, reform their politicai system, balance their budget, and improve 
the food in their hotels are a familiar figure of fun. But in a real sense they differ 
from the political scientist who devotes 20 years to living and studying in the 
country only in that they base their conclusions on a much smaller sample of 
experience and are less likely to be aware of the extent of their ignorance. In 
science and human affairs alike we lack the resources to Study more than a 
fragment of the phenomena that might advance our knowledge. 

This book contains an account of the body of theory that has been built up to 
provide a background for good sampling methods. In most of the applications for 
which this theory was constructed, the aggregate about which information js 
desired is finite and delimited—the inhabitants of a town, the machines in a 


suspicious of samples and reluctant to use them in place of censuses. Although this 
attitude no longer persists, it may be well to list the principal advantages of 
sampling.as compared with complete enumeration. 


Reduced Cost 


If data are secured from only a small fraction of the aggregate, expenditures are 
smaller than if a complete census js attempted. With large populations, results 
accurate enough to be useful can be obtained from samples that represent only a 
small fraction of the population. In the United States the most important 
recurrent surveys taken by the government use samples of around 105,000 
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persons, or about one person in 1240. Surveys used to provide facts bearing on 
sales and advertising policy in market research may employ samples of only a few 
thousand. 


Greater Speed 


For the same reason, the data can be collected and summarized more quickly 
with a sample than with a complete count. This is a vital consideration when the 
information is urgently needed. 


Greater Scope 


In certain types of inquiry highly trained personnel or specialized equipment, 
limited in availability, must be used to obtain the data. A complete census is 
impracticable: the choice lies between obtaining the information by sampling or 
not at all. Thus surveys that rely on sampling have more scope and flexibility 
regarding the types of information that can be obtained. On the other hand, if 
accurate information is wanted for many subdivisions of the population, the size of 
sample needed to do the job is sometimes so large that a complete enumeration 
offers the best solution. 


Greater Accuracy 


Because personnel of higher quality can be employed and given intensive 
training and because more careful supervision of the field work and processing of 
results becomes feasible when the volume of work is reduced, a sample may 


produce more accurate results than the kind of complete enumeration that can be 
taken, 


1.2 SOME USES OF SAMPLE SURVEYS 


To an observer of developments in sampling over the last 25 years the most 
striking feature is the rapid increase in the number and types of surveys taken by 
sampling. The Statistical Office of the United Nations publishes reports from time 
to time on “Sample Surveys of Current Interest” conducted by member countries. 
The 1968 report lists surveys from 46 countries. Many of these surveys seek 
information of obvious importance to national planning on topics such as agricul- 
tural production and land use, unemployment and the size of the labor force, 
industrial production, wholesale and retail prices, health status of the people, and 
family incomes and expenditures. But more specialized inquiries can also be 
found: for example, annual leave arrangements (Australia), causes of divorce 
(Hungary), rural debt and investment (India), household water consumption 
(Israel), radio listening (Malaysia), holiday spending (Netherlands), age structure 
of cows (Czechoslovakia), and job vacancies (United States). 

Sampling has come to play a prominent part in national decennial censuses. In 
the United States a 5% sample was introduced into the 1940 Census by asking 


INTRODUCTION 3 


extra questions about occupation, parentage, fertility, and the like, of those 
persons whose names fell on two of the 40 lines on each page of the schedule. The 
use of sampling was greatly extended in 1950. From a 20% sample (every fifth 
line) information was obtained on items such as income, years in school, migra- 
tion, and service in armed forces. By taking every sixth person in the 20% sample, 
a further sample of 33% was created to give information on marriage and fertility. 
A series of questions dealing with the condition and age of housing was split into 
five sets, each set being filled in at every fifth house. Sampling was also employed 
to speed up publication of the results. Preliminary tabulations for many important 
items, made on a sample basis, appeared more than a year and half before the final 
reports. 

This process continued in the 1960 and 1970 Censuses. Except for certain basic 
information required from every person for constitutional or legal reasons, the 
whole census was shifted to a sample basis. This change, accompanied by greatly 
increased mechanization, resulted in much earlier publication and substantial 
savings. 

In addition to their use in censuses, continuing samples are employed by 
government bureaus to obtain current information. In the United States, exam- 
ples are the Current Population Survey, which provides monthly data on the size 
and composition of the labor force and on the number of unemployed, the 
National Health Survey, and the series of samples needed for the calculation of 
the monthly Consumer Price Index. 

On a smaller scale, local governments—city, state, and county—are making 
increased use of sample surveys to obtain information needed for future planning 
and for meeting pressing problems. In the United States most large cities have 
commercial agencies that make a business of planning and conducting sample 
surveys for clients. 

Market research is heavily dependent on the sampling approach. Estimates of 
the sizes of television and radio audiences for different programs and of news- 
paper and magazine readership (including the advertisements) are kept continu- 
ally under scrutiny. Manufacturers and retailers want to know the reactions of 
people to new products or new methods of packaging, their complaints about old 
products, and their reasons for preferring one product to another. 

Business and industry have many uses for sampling in attempting to increase the 
efficiency of their internal operations. The important areas of quality control and 
acceptance sampling are outside the scope of this book. But, obviously, decisions 
taken with respect to level or change of quality or to acceptance or rejection of 
batches are well grounded only if results obtained from the sample data are valid 
(within a reasonable tolerance) for the whole batch. The sampling of records of 
business transactions (accounts, payrolls, stock, personnel)—usually much easier 
than the sampling of people—can provide serviceable information quickly and 
economically. Savings can also be made through sampling in the estimation of 
inventories, in studies of the condition and length of the life of equipment, in the 
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inspection of the accuracy and rate of output of clerical work, in investigating how 
key personnel distribute their working time among different tasks, and, pore 
generally, in the field known as operations research. The books by Deming (1960) 
and Slonim (1960) contain many interesting examples showing the range of 
applications of the sampling method in business. ‘ ; 

Opinion, attitude, and election polls, which did much to bring the technique of 
sampling before the public eye, continue to be a popular feature of newspapers. In 
the field of accounting and auditing, which has employed sampling for many years, 
a new interest has arisen in adapting modern developments to the particular 
problems of this field. Thus, Neter (1972) describes how airlines and railways save 
money by using samples of records to apportion income from freight and 
passenger service. The status of sample surveys as evidence in lawsuits has also 
been subject to lively discussion. Gallup (1972) has noted the major contribution 
that sample surveys can make to the process of informed government by deter- 
mining quickly people’s opinions on Proposed or new government programs and 
has stressed their role as sources of information in social science. 

Sample surveys can be classified broadly into two types—descriptive and 
analytical. In a descriptive survey the objective is simply to obtain certain 
information about large groups: for example, the numbers of men, women, and 
children who view a television program. In an analytical survey, comparisons are 
made between different subgroups of the population, in order to discover whether 
differences exist among them and to form or to verify hypotheses about the 
reasons for these differences. The Indianapolis fertility Survey, for instance, was 
an attempt to determine the extent to which married couples plan the number and 


spacing of children, the husband’s and wife’s attitudes toward this planning, the 


reasons for these attitudes, and the degree of success attained (Kiser and Whelp- 
ton, 1953). 


The distinction between descriptive and analytical Surveys is not, of course, 


clear-cut. Many surveys provide data that serve both purposes. Along with the rise 
in the number of descriptive surveys, there has, however, been a noticeable 
increase in surveys taken primarily for analytical purposes, particularly in the 
study of human behavior and health. Surveys of the teeth of school children before 
and after fluoridation of water, of the death rates and causes of death of people 
who smoke different amounts, and the huge study of the effectiveness of the Salk 
polio vaccine may be cited. The study by Coleman (1966) on equality of 
educational opportunity, conducted on a national sample of schools, contained 


many regression analyses that estimated the telative contributions of school 


characteristics, home background, and the child’s outlook to variations in exam 
results, 


1.3 THE PRINCIPAL STEPS IN A SAMPLE SURVEY 


Asa preliminary to a discussion of the role t 


y S hat theory plays in a sample survey, 
it is useful to describe briefly the steps involve; 


din the planning and execution of a 
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survey. Surveys vary greatly in their complexity. To take a sample from 5000 
cards, neatly arranged and numbered in a file, is an easy task. It is another matter 
to sample the inhabitants of a region where transport is by water through the 
forests, where there are no maps, where 15 different dialects are spoken, and 
where the inhabitants are very suspicious of an inquisitve stranger. Problems that 
are baffling in one survey may be trivial or nonexistent in another. 

The principal steps in a survey are grouped somewhat arbitrarily under 11 
headings. 


Objectives of the Survey 


A lucid statement of the objectives is most helpful. Without this, it is easy in a 
complex survey to forget the objectives when engrossed in the details of planning, 
and to make decisions that are at variance with the objectives. 


Population to be Sampled 


The word population is used to denote the aggregate from which the sample is 
chosen. The definition of the population may present no problem, as when 
sampling a batch of electric light bulbs in order to estimate the average length of 
life of a bulb. In sampling a population of farms, on the other hand, rules must be 
setup to define a farm, and borderline cases arise. These rules must be usable in 
practice: the enumerator must be able to decide in the field, without much 
hesitation, whether or not a doubtful case belongs to the population. 

The population to be sampled (the sampled population) should coincide with 
the population about which information is wanted (the target population). Some- 
times, for reasons of practicability or convenience, the sampled population is 
more restricted than the target population. If so, it should be remembered that 
conclusions drawn from the sample apply to the sampled population. Judgment 
about the extent to which these conclusions will also apply to the target population 
must depend on other sources of information. Any supplementary information 


that can be gathered about the nature of the differences between sampled and 
target population may be helpful. 


Data to be Collected 


It is well to verify that all the data are relevant to the purposes of the survey and 
that no essential data are omitted, There is frequently a tendency, particularly 
with human populations, to ask too many questions, some of which are never 
subsequently analyzed. An overlong questionnaire lowers the quality of the 
answers to important as well as unimportant questions. 


Degree of Precision Desired — 


The results of sample surveys are always subject to some uncertainty because 
only part of the population has been measured and because of errors of measure- 
ment. This uncertainty can be reduced by taking larger samples and by using 
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superior instruments of measurement. But this usually costs ae Pamano 
Consequently, the specification of the degree of precision a int S reel oH 
an important step. This step is the responsibility of the person who is going ; 
resent difficulties, since many administrators are unaccustome 
E y% R of the amount of error that can be tolerated in estimates; 
eae eee making good decisions. The statistician can often help at this stage. 


Methods of Measurement 


There may be a choice of measuring instrument and of method of approach to 
the population. Data about a person’s state of health may be obtained from 
statements that he or she makes or from a medical examination. The survey may 
employ a self-administered questionnaire, an interviewer who reads a standard 
set of questions with no discretion, or an interviewing process that allows much 
latitude in the form and ordering of the questions. The approach may be by mail, 
by telephone, by personal visit, or by a combination of the three. Much study has 
been made of interviewing methods and problems (see, e.g., Hyman, 1954 and 
Payne, 1951). 

A major part of the preliminary work is the construction of record forms on 
which the questions and answers are to be entered. With simple questionnaires, 
the answers can sometimes be precoded—that is, entered in a manner in which 
they can be routinely transferred to mechanical equipment. In fact, for the 
construction of good record forms, it is necessary to visualize the structure of the 
final summary tables that will be used for drawing conclusions, 


The Frame 


Before selecting the sample, the 


population must be divided into parts that are 
called sampling units, or units. The: 


se units must cover the whole of the population 
and they must not overlap, in the sense that every element in the population 
belongs to one and only one unit. Sometimes the appropriate unit is obvious, as in 
a population of light bulbs, in which the unit is the single bulb. Sometimes there is 
a choice of unit. In sampling the People in a town, the unit might be an individual 
person, the members of a family, or all Persons living in the same city block. In 
sampling an agricultural crop, the unit might be a field, a farm, or an area of land 
whose shape and dimensions are at our disposal. 

The construction of this list of sam; 

major practical problems. From b 
critical attitude toward lists that have been 
Despite assurances to the contrary, such lists are often found to be incomplete, or 
partly illegible, or to contain an un 
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Selection of the Sample 


There is now a variety of plans by which the sample may be selected. For each 
plan that is considered, rough estimates of the size of sample can be made from a 
„knowledge of the degree of precision desired. The relative costs and time involved 
for each plan are also compared before making a decision. ? 


The Pretest 


It has been found useful to try out the questionnaire and the field methods on a 
small scale. This nearly alwàys results in improvements in the questionnaire and 
may reveal other troubles that will be serious on a large scale, for example, that the 
cost will be much greater than expected. 


Organization of the Field Work 


In extensive surveys many problems of business administration are met. The 
personnel must receive training in the purpose cf the survey and in the methods of 
measurement to be employed and must be adequately supervised in their work. A 
procedure for early checking of the quality of the returns is invaluable. Plans must 
be made for handling nonresponse, thatis, the failure of the enumerator to obtain 
information from certain of the units in the sample. 


‘Summary and Analysis of the Data 


The first step is to edit the completed questionnaires, in the hope of amending 
recording errors, or at least of deleting data that are obviously erroneous. 
Decisions about computing procedure are needed in cases in which answers to 
certain questions were omitted by some respondents or were deleted in the editing 
process. Thereafter, the computations that lead to the estimates are performed. 
Different methods of estimation may be available for the same data. 

In the presentation of results it is good practice to report the amount of error to 
be expected in the most important estimates. One of the advantages of probability 
sampling is that such statements can be made, although they have to be severely 
qualified if the amount of nonresponse is substantial. 


Information Gained for Future Surveys 


The more information we have initially about a population, the easier it is to 
devise-a sample that will give accurate estimates. Any completed sample is 
potentially a guide to improved future sampling, in the data that it supplies about 
the means, standard deviations, and nature of the variability of the principal 
measurements and about the costs involved in getting the data. Sampling practice 
advances more rapidly when provisions are made to assemble and record informa- 
tion of this type. 

There is another important respect in which any completed sample facilitates 
future samples. Things never go exactly as planned in a complex survey. The alert 
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sampler learns to recognize mistakes in execution and to see that they do not occur 
in future surveys. 


1.4 THE ROLE OF SAMPLING THEORY 


This list of the steps in a sample survey has been given in order to emphasize that 
sampling is a practical business, which calls for several different types of skill. In 
some of the steps—the definition of the populati 
to be collected and of the methods of measure 
field work—sampling theory plays at most a mi 
not discussed further in this book, their impor 
demands attention to all phases of the activity: 
survey in which everything else is done well. 


on, the determination of the data 
ment, and the organization of the 
nor role. Although these topics are 
tance should be realized. Sampling 
Poor work in one phase may ruin a 


principle of specified precision at 
tion of theory. 


In order to apply this Principle, we must be able to Predict, for any sampling 
procedure that is under consideratio; > the precision and the cost to be expected, 
So far as precision is concerned, we cannot foretell exactly how large an error will 
be present in an estimate in any specific situation, for this 
knowledge of the true value for th 
sampling procedure is judged by exa 
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large surveys in which many different measurements with differing frequency 
distributions are made on the units. In surveys in which only a few measurements 
per unit are made, studies of their frequency distributions may justify the 
assumption of known mathematical forms, permitting the results from classical 
theory to be applied. 

A second difference is that the populations in survey work contain a finite 
number of units. Results are slightly more complicated when sampling is from a 
finite instead of an infinite population. For practical purposes these differences in 
results for finite and infinite populations can often be ignored. Cases in which this 
is not so will be pointed out. 


1.5 - PROBABILITY SAMPLING 


The sampling procedures considered in this book have the following mathemat- 
ical properties in common. 


1. We are able to define the set of distinct samples, S4, S3, ++- , S, which the 
procedure is capable of selecting if applied to a specific population. This means 
that we can say precisely what sampling units beiong to S4, to Ss, and so on. For 
example, suppose that the population contains six units, numbered 1 to 6. A 
common procedure for choosing a sample of size 2 gives three possible 
candidates—S, ~ (1, 4); S~ (2, 5); $; ~(3, 6). Note that not all possible samples 
of size 2 need be included. 3 

2. Each possible sample S, has assigned to it a known probability of selection 
Ti. 

3. We select one of the S; by a random process in which each S, receives its 
appropriate probability 7; of being selected. In the example we might assign equal 
probabilities to the three samples. Then the draw itself can be made by choosing a 
random number between 1 and 3. If this number is j, S; is the sample that is taken. 

4. The method for computing the estimate from the sample must be stated and 
must lead to a unique estimate for any specific sample. We may declare, for 
example, that the estimate is to be the average of the measurements on the 
individual units in the sample. 


For any sampling procedure that satisfies these properties, we are in a position 
to calculate the frequency distribution of the estimates it generates if repeatedly 
applied to the same population. We know how frequently any particular sample S; 
will be selected, and we know how to calculate the estimate from the data in S; Itis 
clear, therefore, that a sampling theory can be developed for any procedure of this 
type, although the details of the development may be intricate. The term 
probability sampling refers to a method of this type. 

In practice we seldom draw a probability sample by writing down the S; and 7; 
as qutlined above. This is intolerably laborious with a large population, where a 
Sampling procedure may produce billions of possible samples. The draw is most 
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commonly made by specifying probabilities of inclusion for the individual units 
and drawing units, one by one or in groups until the sample of desired size and type 
is constructed. For the purposes of a theory it is Sufficient to know that we could 
write down the S; and 7; if we wanted to and had unlimited time. 


1.6 ALTERNATIVES TO PROBABILITY SAMPLING 


The following are some common types of nonprobability sampling. 


1. The sample is restricted to a part of the population that is readily accessible. 
A sample of coal from an open wagon may be taken from the top 6 to 9 in. 

2. The sample is selected haphazardly. In picking 10 rabbits from a large cage 
in a laboratory, the investigator may take those that his hands rest on, without 
conscious planning. 

3. With a small but heterogeneous population, the sampler inspects the whole 
of it and selects a small sample of “typical” units—that is, units that are close to his 
impression of the average of the population. 

4. The sample consists essentially of volunteers, in studies in which the 
measuring process is unpleasant or troublesome to the person being measured. 


Under the right conditions, any of these methods can give useful results. They 
are not, however, amenable to the development of a sampling theory that is 
model-free, since no element of random selection is involved. About the only way 
of examining how good one of them may be is to find a situation in which the 
results are known, either for the whole population or for a probability sample, and 
make comparisons. Even if a method appears to do well in one such comparison, 
this does not guarantee that it will do well under different circumstances. 

In this connection, some of the earliest uses of sampling by country and city 
governments from 1850 onward were intended to save money in making esti- 
mates from the results of a Census. For the most important items in the Census, 
the country or city totals were calculated from the complete Census data. For 
other items a sample of say 15 or 25% of the Census returns was selected in order 
to lighten the work of estimating country or city totals for these items. Two rival 
methods of sample selection came into use. One, called random selection, was an 
application of probability sampling in which each unit in the population (e.g., each 
Census return) had an equal chance of being included in the sample. For this 
method it was realized that by use of sampling theory and the normal distribution, 
as noted previously, the sampler could predict approximately from the sample 
data the amount of error to be expected in the estimates made from the sample. 
Moreover, for the most important items for which complete Census data were 

‘available, he could check to some extent the accuracy of the predictions. i 

The other method was purposive selection. This was not specifically defined in 
detail but usually had two common features. The sampling unit consisted of 
groups of returns, often relatively large groups. For example, in the 1921 Italian 
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Census the country consisted of 8354 communes grouped into 214 districts. In 
drawing 4 14% sample, the Italian statisticians Gini and Galvani selected 29 
districts purposively rather than 1250 communes. Second, the 29 districts were 
chosen so that the sample gave accurate estimates for 7 important control 
variables for which results were known for the whole country. The hope was that 
this sample would give good estimates for other variables highly correlated with 
the control variables. 

In the 1920s the International Statistical Institute appointed a commission to 
report on the advantages and disadvantages of the two methods. The report, by 
Jensen (1926), seemed on balance to favor purposive selection. However, pur- 
posive selection was abandoned relatively soon as a method of sampling for 
obtaining national estimates in surveys in which many items were measured. It 
lacked the flexibility that later developments of probability sampling produced, it 
was unable to predict from the sample the accuracy to be expected in the 
estimates, and it used sampling units that were too large. Gini and Galvani 
concluded that the probability method called stratified random sampling (Chapter 
5), with the commune as a sampling unit, would have given better results than 
their method. 


1.7 USE OF THE NORMAL DISTRIBUTION 


It is Sometimes useful to employ the word estimator to denote the rule by which 
an estimate of some population characteristic 4 is calculated from the sample 
results, the word estimate being applied to the value obtained from a specific 
sample. An estimator Ê of u given by a sampling plan is called unbiased if the 
mean value of ji, taken over all possible samples provided by the plan, is equal to 
u. In the notation of section 1.5, this condition may be written 

ò 
Elů)= 2 TÄH 
iz 
where ji, is the estimate given by the ith sample. The symbol E, which stands for 
“the expected value of,” is used frequently. 

As mentioned in section 1.4, the samples in surveys are often large enough so 
that estimates made from them are approximately normally distributed. Further- 
more, with probability sampling, we have formulas that give the mean and 
variance of the estimates. Suppose that we have taken a sample by a procedure . 
known to give an unbiased estimator and have computed the sample estimate Å 
and its standard deviation ø; (often called, alternatively, its standard error). How 
good is the estimate? We cannot know the exact value of the error of estimate 
(ji ~ u) but, from the properties of the normal curve, the chances are 


0.32 (about 1 in 3) that the absolute error |ô- u| exceeds o; 
0.05 (1 in 20) that the absolute error |Ê — u| exceeds 1.960; =20; 
0.01 (1 in 100) that the absolute error |Â — u| exceeds 2.580; 
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For example, if a probability sample of the records of batteries in routine use in 
a large factory shows an average life 4 = 394 days, witha standard error o; = 46 
days, the chancés are 99 in 100 that the average life in the population of batteries 
lies between 
Ê =394—(2.58)(4.6) =382 days 
and 


fiy = 394+ (2.58)(4.6) = 406 days 


The limits, 382 days and 406 days, are called lower and upper confidence limits. 
With a single estimate from a single survey, the statement “u lies between 382 and 
406 days” is not certain to be correct. The “99% confidence” figure implies that if 
the same sampling plan were used many times in a population, a confidence 
statement being made from each sample, about 99% of these statements would be 
correct and 1% wrong. When sampling is being introduced into an operation in 
which complete censuses have previously been used, a demonstration of this 
property is sometimes made by drawing repeated samples of the type proposed 
from a population for. which complete records exist, so that u is known (see, e.g., 
Trueblood and Cyert, 1957). The practical verification that approximately the 
stated proportion of statements is correct does much to educate and reassure 
administrators about the nature of sampling. Similarly, when a single sample is 


taken from each of a series of different Populations, about 95% of the 95% 
confidence statements are correct. 


The preceding discussion assume 
known exactly. Actually, Tp, like Å, 
distributed variable, tables of Stu 
normal tables to calculate confid 


s that o;, as computed from the sample, is 
is subject toa sampling error. With a normally 
dent’s ¢ distribution are used instead of the 


lence limits for u when the sample is small. 
Replacement of the normal table by the ¢ table makes almost no difference if the 


number of degrees of freedom in ga exceeds 50. With certain types of stratified 


sampling and with the method of replicated sampling (section 11.19) the degrees 
of freedom are small and the ¢ table is needed. 


1.8 BIAS AND ITS EFFECTS 


In sample survey theory it is necessa 


Ty to consider biased estimators for two 
reasons, 


1. In some of the 
ratios, estimators th 
biased. 


most common problems, Particularly in the estimation of 
at are otherwise convenient and suitable are found to be 


2. Even with estimators that are unbiased in Probability sampling, errors of 
measurement and nonreponse may produce biases in the numbers that we are able 
to compute from the data. This happens, for instance, if the persons who refuse 
to be interviewed are almost all Opposed to some expenditure of public funds, 


whereas those who are interviewed are split evenly for and against. 
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Fig. 1.1 Effect of bias on errors of estimation. 


To examine the effect of bias, suppose that the estimate ĝ is normally 
distributed about a mean m that is a distance B from the true population value p, 
as shown in Fig. 1.1. The amount of bias is B = m—. Suppose that we do not 
know that any bias is present. We compute the standard deviation o of the 
frequency distribution of the estimate—this will, of course, be the standard 
deviation about the mean m of the distribution, not about the true mean y. We are 
using o in place of oj. As a statement about the accuracy of the estimate, we 
declare that the probability is 0.05 that the estimate 2 is in error by more than 
1.960. ž 

We will consider how the presence of bias distorts this probability. To do this, 
we calculate the true probability that the estimate is in error by more than 1.96g, 
where error is measured from the true mean y. The two tails of the distribution 
must be examined separately. For the upper tail, the probability of an error of 
more than +1.96ø is the shaded area above Q in Fig. 1.1. This area is given by 


1 E =(A—m)2/202 7a 
ovm u+1.960 $ di 
Put Á -m = øt. The lower limit of the range of integration for t is 
= B 
oem +1,96=1.96-— 
oe o 
Thus the-area is 
Tales i 
zE | e 7! dt 


2yr 1.96-(B/o) 
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Similarly, the lower tail, that is, the shaded area below P, has an area 


1 [7298-@/2) 
—12, 
f eadi 


From the form of the integrals it is clear that the amount of disturbance depends 
solely on the ratio of the bias to the standard deviation. The results are shown in 
Table 1.1. 


TABLE 1.1 


EFFECT OF A BIAS B ON THE PROBABILITY OF AN ERROR 
GREATER THAN 1.960 


Probability of Error 


a Bia < —1.960 > 1,960 Total 
0,02 0.0238 0.0262 0.0500 
0.04 0.0228 0.0274 0.0502 
0,06 0.0217 0,0287 0,0504 
0,08 0.0207 0.0301 0.0508 
0.10 0.0197 ~ 0.0314 0.0511 
0.20 0.0154 0.0392 0.0546 
0.40 0.0091 0.0594 0.0685 A 
0.60 0.0052 0.0869 0.0921 
0.80 0.0029 0.1230 0.1259 
1.00 0.0015 0.1685 0.1700 


1.50 0.0003 0.3228 0.3231 


For the total probability of an error of more than 1.96g, the bias has little effect 
provided that it is less than one tenth of the standard deviation, At this point the 
total probability is 0.0511 instead of the 0.05 that we think it is. As the bias 
increases further, the disturbance becomes more serious. At B =a, the total 
probability of error is 0.17, more than three times the presumed value. 

The two tails are affected differently. With a positive bias, as in this example, the 
probability of an underestimate by more than 1.960 shrinks rapidly from the 
presumed 0.025 to become negligible when B = ø. The probability of the corres- 
ponding overestimate mounts steadily. In most applications the total error is the 
primary interest, but occasionally we are particularly interested in errors in one 
direction. 3 ; A A 

Asaworking rule, the effect of bias on the accuracy of an estimate is negligible if 
the bias is less than one tenth of the standard deviation of the estimate. If we have 
a biased method of estimation for which B/ a <0.1, where B is the absolute value 
of the bias, it can be claimed that the bias is not an appreciable disadvantage of the 
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method. Even with B/o = 0.2, the disturbance in the probability of the total error 
is modest. 

In using these results, a distinction must be made between the two sources of 
bias mentioned at the beginning of this section. With biases of the type that arise in 
estimating ratios, an upper limit to the ratio B/a can be found mathematically, If 
the sample is large enough, we can be confident that B/o will not exceed 0.1. With 
biases caused by errors of measurement or nonresponse, on the other hand, it is 
usually impossible to find a guaranteed upper limit to B/o that is small. This 
troublesome problem is discussed in Chapter 13. 


1.9 THE MEAN SQUARE ERROR 


In order to compare a biased estimator with an unbiased estimator, or two 
estimators with different amounts of bias, a useful criterion is the mean square 
error (MSE) of the estimate, measured from the population value that is being 
estimated. Formally, 


MSE(ji) = E(t -p) = E[(a —m)+(m-) 
=E(i-—m)+2(m—p)E(a—m)+(m—p) 
= (variance of ji) + (bias)? 


the cross-product term vanishing since E(ĝ - m) = 0. 

Use of the MSE as a criterion of the accuracy of an estimator amounts to 
regarding two estimates that have the same MSE as equivalent. This is not strictly 
correct because the frequency distributions of errors (Ê — y) of different sizes will 
not be the same for the two estimates if they have different amounts of bias. It has 
been shown, however, by Hansen, Hurwitz, and Madow (1953) that if B/ø is less 
than about one half, the two frequency distributions are almost identical in regard 
to absolute errors |Â — u| of different sizes. Table 1.2 illustrates this result. 


TABLE 1.2 


PROBABILITY OF AN ABSOLUTE ERROR > |V MSE, 
1.96 V MSE anp 2.576 Y MSE 


Probability 
Bia IVMSE 1.96’ MSE 2.576% MSE 
0 0.317 0.0500 0.0100 
0.2 0.317 0.0499 0.0100 
0.4 0.319 0.0495 0.0095 


0.6 0.324 0.0479 0.0083 
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Even at B/o = 0.6, the changes in the probabilities as compared with those for 
B/o =0 are slight. : 

Because of the difficulty of ensuring that no unsuspected bias enters into 
estimates, we will usually speak of the precision of an estimate instead of its 
accuracy. Accuracy refers to the size of deviations from the true mean p, whereas 
precision refers to the size of deviations from the mean m obtained by repeated 
application of the sampling procedure. 


EXERCISES 


1.1 Suppose that you were using sampling to estimate the total number of words in a 
book that contains illustrations. 

(a) Is there any problem of definition of the population? (b) What are the pros and cons 
of (1) the page, (2) the line, as a sampling unit? 

1.2 A sample is to be taken from a list of names that are on cards (one name to a card) 
numbered consecutively in a file. Each name is to have an equal chance of being drawn in 
the sample. What problems arise in the following common situations? (a) Some of the 
names do not belong to the target population, although this fact cannot be verified for any 
name until it has been drawn. (b) Some names appear on more than one card. All cards with 
the same name bear consecutive numbers and therefore appear together in the file. (c) 
Some names appear on more than one card, but cards bearing the same name may be 
scattered anywhere about the file. 


1.3 The problem of finding a frame that is complete and enables the sample to be drawn 
is often an obstacle. What kinds of frames might be tried for the following surveys? Have 
the frames any serious weaknesses? (a) A survey of stores that sell luggage in a large city. 
(b) A survey of the kinds of articles left behind in subways or buses. (c) A survey of persons 
bitten by snakes during the last year. (d) A survey to estimate the number of hours per week 
spent by family members in watching television. 

1.4 A city directory, 4 years old, lists the addresses in order along each street, and gives 
the names of the persons living at each address. For a current interview survey of the people 
in the city, what are the deficiencies of this frame? Can they be remedied by the 


interviewers during the course of the field work? In using the directory, would you draw a 
list of addresses (dwelling places) or a list of persons? 


1.5 In estimating by sampling the actual value of the small items in the inventory of a 
large firm, the actual and the book value were recorded for each item in the sample. For the 
total sample, the ratio of actual to book value was 1.021, this estimate being approximately 
normally distributed with a standard error of 0.0082. If the book value of the inventory is 
$80,000, compute 95% confidence limits for the actual value. 

1.6 Frequently data must be treated as a sample, although at first sight they appear to 
be a complete enumeration. A proprietor of a parking lot finds that business is poor on 
Sunday mornings. After 26 Sundays in operation, his average receipts per Sunday morning 
are exactly $10. The standard error of this figure, computed from week-to-week variations, 
is $1.2. The attendant costs $7 each Sunday. The Proprietor is willing to keep the lot open at 
this time if his expected future profit is $5 per Sunday morning. What is the confidence 
probability that the long-term profit rate will be at least $5? What assumption must be 
made in order to answer this question? 
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1.7 In Table 1.2, what happens to the probability of exceeding 1VMSE. 1.96VMSE, 
and 2.576/MSE when B/o tends to infinity, that is, when the MSE is due entirely to bias? 
Do your results agree with the directions of the changes noted in Table 1.2 as B/s moves 
from 0 to 0.6? 

1.8 When it is necessary to compare two estimates that have different frequency 
distributions of errors (ĝ-— 4), it is occasionally possible, in specialized problems, to 
compute the cost or loss that will result from ar error (Å — p) of any given size. The estimate 
that gives the smaller expected loss is preferred, other things being equal. Show that if the 
loss is a quadratic function A (Å — p}? of the error, we should choose the estimate with the 
smaller mean square error. 


CHAPTER 2 


Simple Random Sampling 


2.1 SIMPLE RANDOM SAMPLING 


Simple random sampling is a method of selecting n units out of the N such that 
every one of the yC, distinct samples has an equal chance of being drawn. In 
practice a simple random sample is drawn unit by unit. The units in the population 

: are numbered from 1 to N. A series of random numbers between 1 and N is then 
drawn, either by means of a table of random numbers or by means of a computer 
program that produces such a table. At any draw the process used must give an 
equal chance of selection to any number in the population not already drawn. The 
units that bear these n numbers constitute the sample. ; , 

It is easily verified that all yC, distinct samples have an equal chance of being 
selected by this method. Consider one distinct sample, that is, one set of n 
specified units. At the first draw the probability that some one of the-n specified 
units is selected is n/N. At the second draw the probability that some one of the 
remaining (n — 1) specified units is drawn is (n — 1)/(N—1), and so on. Hence the 
the probability that all n specified units are selected in n draws is 


n (n=l) (n=2) 1 _n\(N=n)! °1 
N(N=1) (N=) (Wnt) N ~G (2.1) 


Since a number that has been drawn is removed from the population for all 
subsequent draws, this method is also called random sampling without replace- 
ment. Random sampling with replacement is entirely feasible: at any draw, all, 
N members of the population are given an equal chance of being drawn, no matter 
how often they have already been drawn. The formulas for the variances and 
estimated variances of estimates made from the sample are often simpler when 
sampling is with replacement than when it is without replacement. For this reason 
sampling with replacement is sometimes used in the more complex sampling 
plans, although at first sight there seems little point in having the same unit two or 
more times in the sample. 

-18 
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2.2 SELECTION OF A SIMPLE RANDOM SAMPLE 


Tables of random numbers are tables of the digits 0, 1,.2,...9, each digit 
having an equal chance of selection at any draw. Among the larger tables are those 
published by the Rand Corporation (1955)—1 million digits—and by Kendall and 
Smith (1938)—100,000 digits. Numerous tables are available, many in standard 
statistical texts. Table 2.1 shows 1000 random digits for illustration, from 
Snedecor and Cochran (1967). 


TABLE 2.1 
ONE THOUSAND RANDOM DIGITS 


00-04 05-09 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 


00 54463 22662 65905 70639 79365 67382 29085 69831 47058 08186 
0t 15389 85205 18850 39226 42249 90669 96325 23248 60933 26927 
02 85941 40756 82414 02015 13858 78030 16269 65978 01385 15345 
03 61149 69440 11286 88218 58925. 03638 52862 62733 33451 77455 
04 05219 81619 10651 67079 92511 59888 84502 72095 83463 75577 


05 41417 98326 87719 92294 46614 50948 64886 20002 97365 30976 
06 28357 94070 20652 35774 16249 75019 21145 05217 47286 76305 


07 17783 00015 10806 83091 91530 36466 39981 62481 49177 75779 
08 40950 84820 29881 85966 62800 70326 84740 62660 77379 90279 
09 82995 64157 66164 41180 10089 41757 78258 96488 88629 37231 


10 96754 17676 55659 44105 47361 34833 86679 23930 53249 27083 
11 34357 88040 53364 71726 45690 66334 60332 22554 90600 71113 
12 06318 37403 49927 57715 50423 67372 63116 48888 21505 80182 
13 62111 52820 07243 79931 89292 84767 85693 73947 22278 11551 
14 47534 09243 67879 00544 23410 12740 02540 54440 32949 13491 


15 98614 75993 84460 62846 59844 14922 48730 73443 48167 34770 
16 24856 03648 44898 09351 98795 18644 39765 71058 90368 44104 
17 96887 12479 80621 66223 86085 78285 02432 53342 42846 94771 
18 90801 21472 42815 77408 37390 76766 52615 32141 30268 18106 
19 55165 77312 83666 36028 28420 70219 81369 41943 47366 41067 


In using these tables to select a simple random sample, the first step is to number 
the units in the population from 1 to N. If the first digit of N is a number between 5 
and 9, the following method of selection is adequate. Suppose N = 528, and we 
want n = 10, Select three columns from Table 2.1, say columns 25 to 27. Go down 
the three columns, selecting the first 10 distinct numbers between 001 and 528. 
These are 36, 509, 364, 417, 348, 127, 149, 186, 290, and 162. For the last two. 
numbers we jumped to columns 30 to 32. In repeated selections it is advisable to, 
vary the starting point in the table. 


p 
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The disadvantage of this method is that the three-digit numbers 000 and 529 to 
999 are not used, although skipping numbers does not waste much time. When the 
first digit of N is less than 5, some may still prefer this method if n is small and a 
large table of random digits is available. 

With N = 128, for example, a second method that involves less rejection and is 
easily applied is as follows. In a series of three-digit numbers, subtract 200 from all 
numbers between 201 and 400, 400 from all numbers between 401 and 600, 600 
from all numbers between 601 and 800, 800 from all numbers between 801 and 
999 and, of course, 000 from all numbers between 000 and 200. All remainders 
greater than 129 and the numbers 000, 200, and so forth, are rejected. Using. 
columns 05 to 07 in Table 2.1, we get 26, 52, 7, 94, 16, 48, 41, 80, 128, and 92, the 
draw requiring 15 three-digit numbers for n = 10. In this sample the rejection rate 

5/15 = 33% is close to the probability of rejection 72/200 = 36% for this method. 
In using this method with a number N like 384, note that one subtracts 400 from a 
number between 401 and 800, but automatically rejects all numbers greater than 
800. Subtraction of 800 from numbers between 801 and 999 would give a higher 
probability of acceptance to remainders between 001 and 199 than to remainders 
between 200 and 384. 

Other methods of sampling are often preferable to simple random sampling on 
the grounds of convenience or of increased precision. Simple random sampling 
serves best to introduce sampling theory. 


2.3 DEFINITIONS AND NOTATION 


In a sample survey we decide on certain properties that we attempt to measure 
and record for every unit that comes into the sample. These properties of the units 
are referred to as characteristics or, more simply, as items. l 

The values obtained for any specific item in the N units that comprise the 


population are denoted by y1, Y2, - - -> YN- The corresponding values for the units 
in the sample are denoted by y1, y2,..-» Yn: OF, if we wish to refer to a typical 
sample member, by y; (i= 1, 2,..., n). Note that the sample will not consist of the 


first n units in the population, except in the instance, usually rare, in which these ' 
units happen to be drawn. If this point is kept in mind, my experience has been that 
no confusion need result. 

Capital letters refer to characteristics of the population and lowercase letters to 
those of the sample. For totals and means we have the following definitions. 


Population Sample 
N DAD a 
Tota: Y=}Ly=y1+y2+: FYN Eys yitysk Eya 
© hs h 
a yityat t tn Lyi __ Yityote +n DY 
Mean: Beara ye RAN WG A i 
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Although sampling is undertaken for many purposes, interest centers most 
frequently on four characteristics of the population. 


1. Mean = Y (e.g., the average number of children per school). 

2. Total = Y (e.g., the total number of acres of wheat in a region). 

3. Ratio of two totals or means R = Y/X = Y/X (e.g., ratio of liquid assets to 
total assets in a group of families). 

4. Proportion of units that fall into some defined class (e.g., proportion of 
people with false teeth). 


Estimation of the first three quantities is discussed in this chapter. 
The symbol ^ denotes an estimate of a population characteristic made from a 
sample. In this chapter only the simplest estimators are considered. 


Estimator 
Population mean Y Y= y =sample mean 
Population total Y =Nğý=NĎ} y;/n 


Y 
Population ratio R R=5/¥=> [3 Xx, 


In Y the factor N/n by which the sample total is multiplied is sometimes called 
the expansion or raising or inflation factor. Its inverse n/N, the ratio of the size of 
the sample to that of the population, is called the sampling fraction and is denoted 
by the letter f. 


2.4 PROPERTIES OF THE ESTIMATES 


The precision of any estimate made from a sample depends both on the method 
by which the estimate is calculated from the sample data and on the plan of 
sampling. To save space we sometimes write of “the precision of the sample 
mean” or “the precision of simple random sampling,” without specifically men- 
tioning the other fundamental factor. This has been done, we hope, only in 
instances in which it is clear from, the context what the missing factor is. When 
studying any formula that is presented, the reader should make sure that he or she 
knows the specific method of sampling and method of estimation for which the 
formula has been established. 

In this book a method of estimation is called consistent if the estimate becomes 
exactly equal to the population value when n =N, that is, when the sample 
consists of the whole population. For simple random sampling it is obvious that y 


Paha West Be 


Dee ID 
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Hansen, Hurwitz, and Madow (1953) and Murthy (1967) give an alternative 
definition of consistency, similar to that in classical statistics. An estimator is 
consistent if the probability that it is in error by more than any given amount tends 
to zero as the sample becomes large. Exact statement of this definition requires 
care with complex survey plans. 

As we have seen, a method of estimation is unbiased if the average value of the 
estimate, taken over all possible samples of given size n, is exactly equal to the true 
population value. If the method is to be unbiased without qualification, this result 
must hold for any population of finite values y; and for any n. To investigate 
whether y is unbiased with simple random sampling, we calculate the value of ï 
for all yC, samples and find the average of the estimates. The symbol E denotes 
this average over all possible samples. 


Theorem 2.1. The sample mean F is an unbiased estimate of Y. 
Proof. By its definition 


DY AOE yya) ” 
nC,  nEN!/n!(N-n)!] 

where the sum extends over all yC, samples. To evaluate this sum, we find out in 
how many samples any specific value y; appears. Since there are (V— 1) other 


units available for the rest of the sample and (n—1) other places to fill in the 
sample, the number of samples containing y; is 


(N-1)! 


Ey= (2.2) 


n-1Gn-1= 7 Na) (2.3) 
Hence 
N-1)! A 
Lit yet ty) =p On H +y) 
From (2.2) this gives 
zon UN 1)! n\(N—n)! 
EI- GDh nN OIIE + +N) 
tygtes st ? 
-0 ae Yw)_ 7 (2.4) 


Corollary. Y= Nj is an unbiased estimate of the population total Y. 
A less cumbersome proof of theorem 2.1 is obtained as follows. Since every unit 
appears in the same number of samples, it is clear that 


E(yı+y2+: * -+ y„) must be some multiple of y,+y2+:::+Yn (2.5) 


The multiplier must be n/N, since the expression on the left has n terms and that 
on the right has N terms. This leads to the result. 5 
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2.5 VARIANCES OF THE ESTIMATES 


The variance of the y; in a finite population is usually defined as 
x 2 
z i- Y) 7 
ENTA 


2 
(o 


(2.6) 


As a matter ọf notation, results are presented in terms of a slightly different 
expression, in which the divisor (N— 1) is used instead of N. We take 


l; Èo Sy) 
Sie (2.7) 


This convention has been usea by those who approach sampling theory by means 
of the analysis of variance. Its advantage is that most results take a slightly simpler 
form. Provided that the same notation is maintained consistently, all results are 
equivalent in either notation. k 

We now consider the variance of y. By this we mean E (¥— Y)? taken over all 
nC, samples. 


Theorem 2.2. The variance of the mean J from a simple random sample is 


S? (N-n)_ S? 
ae 


Vy) = E(9- Y)?= y ee (2.8) 


where f =n/N is the sampling fraction. 
Proof. 
nF=Y)=(yi-¥) + (y2— Y+- ++ +n - ¥) (2.9) 
By the argument of symmetry used in relation (2.5), it follows that 
Ely D+: -tn PAo- FP ++ Ow FP (2.10) 
and also that 
Elly = Yy- Ý +y,- Po- P+ +(Yn-1 7 Y)(yn— Y)] 
mal = = = 2 
= FHL Ao: Y0,- P-P) 
+: +(ynći— Y(yn- ¥)] (2.11) 


In (2.11) the sums of products extend over all pairs of units in the sample and 
population, respectively. The sum on the left contains n(n — 1)/2 terms and that 
on the right contains N(N— 1)/2 terms. 
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Now square (2.9) and average over allsimple random samples. Using (2.10) and 
(2.11), we obtain 


nE- -žo F+- +lyn- Y? 


2(n-1) z z = = 
BS (Gi- Hy- Y) +: +O- Y)(yn— z) 
Completing the square on the cross-product term, we have 
We n n-1 = iS 
neg- Pj=- io ton 


n-1 
} N-1 
The second term inside the curly bracket vanishes, since the sum of the y; equals 
NY. Division by n” gives 
= N-n X 
Vy) =EG—YY =- Nn r 
(9) = EV — YY = NN -1 ZO = nen 


thy Peton PF} 


This completes the proof. 
Corollary 1. The standard error of ĵ is 


s sS 
o= AC AIF (2.12) 


` Corollary 2. The variance of ¥ = Ný, as an estimate of the population total Y, is 
262 (N— 2.62 
v-e- y- 72 NNE a-p (2.13) 


Corollary 3. The standard error of Yis 
NS NS 
or JNN = P (2. 14) 


2.6 THE FINITE POPULATION CORRECTION 


For a random sample of size n from an infinite population, it is well known that 

the variance of the mean is o’/n. The only change in this result when the 

population is finite is the introduction of the factor (N-n)/N. The factors ~ 
V(N-n)/N for the standard error are called the 


N-—n)/N for the variance and : jar i 
be S jalin corrections (tpc). They are given with a divisor (N-1)in place of 
N by writers who present results in terms of o. Provided that the sampling fraction 

ity, and the size of the population as 


n/N remains low, these factors are close to uni 
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such has no direct effect on the standard error of the sample mean. For instance, if 
Sis the same in the two populations, a sample of 500 from a population of 200,000 
gives almost as precise an estimate of the population mean as a sample of 500 from 
a population of 10,000. Persons unfamiliar with sampling often find this result 
difficult to believe and, indeed, it is remarkable. To them it seems intuitively 
obvious that if information has been obtained about only a very small fraction of 
the population, the sample mean cannot be accurate. It is instructive for the reader 
to consider why this point of view is erroneous. 

In practice the fpc can be ignored whenever the sampling fraction does not 
exceed 5% and for many purposes even if it is as high as 10%. The effect of 
ignoring the correction is to overestimate the standard error of the estimate y. 

The following theorem, which is an extension of theorem 2.2, is not required for 
the discussion in this chapter, but it is proved here for later reference. 


Theorem 2.3. If y, x; are a pair of variates defined on every unit in the 
population and J, x are the corresponding means from a simple random sample of 


„size n, then their covariance 


rg- De-9-2 -L o- Pa-X 

g- YE X)= NNA yi x4 —X) (2.15) 

This theorem reduces to theorem 2.2 if the variates y;, x; are equal on every unit. 
Proof. Apply theorem 2.2 to the variate u; = y; +x; The population mean of u; 

is U= Y +X, and theorem 2.2 gives 


-rm Non 1 $ Dm 
E(a—U) nN wai 2 U) 
that is 
N-1 1 8 


ELG-%)+@-*P = Na AlO Ote 216) 


Expand the quadratic terms on both sides. By theorem 2.2, 


N-n 1 


= N = 
BUSY aN Na LPP 


with a similar relation for E( — X)”. Hence these two terms cancel on the left and 
right sides of (2.16). The result of the theorem (equation 2.15) follows from the 
cross-product terms. 


2.7 ESTIMATION OF THE STANDARD ERROR 
FROM A SAMPLE 


The formulas for the standard errors of the estimated population mean and 
total are used primarily for three purposes: (1) to compare the precision obtained 
by simple random sampling with that given by other methods of sampling, (2) to 
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estimate the size of the sample needed in a survey that is being planned, and (3) to 
estimate the precision actually attained in a survey that has been completed. The 
formulas involve S$”, the population variance. In practice this will not be known, 
but it can be estimated from the sample data. The relevant result is stated in 
theorem 2.4. 


Theorem 2.4. For a simple random sample 


LOi-y)? 
zeal 
n-1 
is an unbiased estimate of 
N. 2 
Zo- Y) 
a 
SEINI 
Proof. We may write 
s=- $iy-)-0-HP (2.17) 
n-1 i=l 
pe crea ther 
=| E 01-0 Fy] (2.18) 


Now average over all simple random samples of size n. By the argument of 
symmetry used in theorem 2.2, 


n 2] _ 2 N _ 22ND) 2 
HE ¥'] = Pee tora aa 


by the definition of sz Furthermore, by theorem 2.2, 


Elng- P- "92 


Hence 
2 S? 
= —1)— - = 52 
E(s*) Gan 1)-(N-n)]=S' (2.19) 
Corollary. Unbiased estimates of the variances of ý and Y= NJ are 
2 2 : - 
Fe ane (Nanas 20 
vy) =s5 =( N )==a-7 (2.20) 
Nte (Nen N?s? 
sT =——(1 — 
osp T (4) Sap (2.21) 
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For the standard errers we take 


ie, OO __Ns = 
s= sọ yi (2.22) 


These estimates are slightly biased: for most applications the bias is unimportant. 
The reader should note the symbols employed for true and estimated variances 
of the estimates. Thus, for y, we write 


True variance: Vp) = 057 


Estimated variance: v(y)=s;7 


2.8 CONFIDENCE LIMITS 


It is usually assumed that the estimates f and Y.are normally distributed about 
the corresponding population values. The reasons for this assumption and its 
limitations are considered in section 2.15. If the assumption holds, lower and 
upper confidence limits for the population mean and total are as follows: 

Mean: 


s t a ts 
A A Yo=7+V1-f j (2.23) 
Total: 
Ps Ns X 
nN- Poent (2.24) 
n 


The symbol ¢ is the value of the normal deviate corresponding to the desired 
confidence probability. The most common values are 


Confidence probability (%) 50. 80 90 95 99 


0.67 1.28 1.64 1.96 2.58 


If the sample size is less than 50, the percentage points may-be taken from 
Student's ¢ table with (n—1) degrees of freedom, these being the degrees of 
freedom in the estimated variance s?. The t distribution holds exactly only if the 
observations y; are themselves normally distributed and N is infinite. Moderate 
departures from normality do not affect it greatly. For small samples with very 
skew distributions, special methods are needed. 


Example. Signatures to a petition were collected on 676 sheets. Each sheet had enough 
space for 42 signatures, but on many sheets a smaller number of signatures had been 
collected. The numbers of signatures per sheet were counted on a random sample of 50 
sheets (about a 7% sample), with the results shown in Table 2.2. 

Estimate the total number of signatures to the petition and the 80% confidence limits. 
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The sampling unit is a sheet, and the observations y: are the numbers of signatures per 
sheet. Since about half the sheets had the maximum number of signatures, 42, the data are 
presented as a frequency distribution. Note that the original distribution appears to be far 
from normal, the greatest frequency being at the upper end. Nevertheless, there is reason to 
believe from experience that the means of samples of 50 are approximately normally 
distributed. 

We find 


n=} f,=50, y=L fy =1471, È fy? = 54,497 
Hence the estimated total number of signatures is 


_ (676)(1471) _ 


Y=Ny a 19,888 


For the sample variance s* we have 
1 4 * 1 
Lro- 


= AY 54,497 - 
49 


© fix)? 

saa TAS ] 
(1471)? 
50- 


From (2.22) the 80% confidence limits are 


] =229.0 


(1.28)(676)(15.13)¥1—0.0740 
SE ee ME) 
v50 


This-gives 18,107 and 21,669 for the 80% limits. A complete count showed 21,045 
signatures. 


tNs 
19,888+—=V1-f= 19,888 + 
Vn 


TABLE 2.2 
RESULTS FOR A SAMPLE OF 50 PETITION SHEETS y; = NUMBER OF SIGNATURES: f= 
FREQUENCY 
y% | 42. 41 36 32. 29 27 23 19 16 15 
fi 23 4 1 1 1 2 1 1 2 2 


epee O OMEN ToL. 6 Sad 3 || ota 
E a a 4 i |. 'sy 


2.9 AN ALTERNATIVE METHOD OF PROOF 


Cornfield (1944) suggested a method of proving the Principal results for simple 
random sampling without replacement that enables us to use standard results 
from infinite population theory. Let a; be a random variate that takes the value 1 if 
the ith unit is in the sample and the value 0 otherwise. The sample mean y may be 
written 


È ay, (225) 
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where the sum extends over all N units in the population. In this expression the a; 
are random variables and the y; are a set of fixed numbers. 


Clearly 
n n 
Pr(q=1)=5, Pr(a=0)=1-0 
Thus a; is distributed as a binomial variate in a single trial, with P= n/N. Hence 
n n n\. 
E(a)=P= N V(a)= PO=2(1 -+ (2.26) 


To find V(¥) we need also the covariance of a; and a;. The product aaj is 1if the 
ith and jth unit are both in the sample and is zero otherwise. The probability that 
two specific units are both in the sample is easily found to be n(n — 1)/N(N— 1). 
Hence 


Cov (a,a;) = E(a,a;)— E(a;)E(a;) 
= nS OTAI es in 
~ N(N-1) (5) ane z) (2.27) 


Applying this approach to find V(¥), we have, from (2.25), 


Be ot bf BENG N 
vy) = 4 È yê Vla) +2 È yyy Cov (aa) | (2.28) 
$S J 
1-f; 2 
= Ay ye Naa yy) (2.29) 
using (2.26) and (2.27). Completing the square on the cross-product term gives 
pts A N igh A 2) 
Viy)= nN NIS NY (2:30), 
aml: (1—f)s? 


= -Yy= 2.31 

a TEOR 2.31) 
The method gives easy proofs of theorems 2.3 and 2.4. It may be used to find 
higher moments of the distribution of F, although for this purpose a method given 


by Tukey (1950), with further development by Wishart (1952), is more powerful. 


2.10 RANDOM SAMPLING WITH REPLACEMENT 


A similar approach applies when sampling is with replacement. In this event the 
ith unit may appear 0, 1,2,..., times in the sample. Let ¢ be the number of 
times that the ith unit appears in the sample. Then 


1 N 
È tyi (2.32) 
1 


AEE 
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Since the probability that the ith unit is drawn is 1/N at each draw, the variate 
t; distributed as a binomial number of successes out of n trials with p=1/N. 
Hence 


E(t) =F Vin) = n(Z)(1 -x) K (2.33) 


Jointly, the variates f; follow a multinomial distribution. For this, . 
n 
Cov (tt) = “nN? (2.34) 


Using (2.32), (2.33), and (2.34), we have, for sampling with replacement, 


-,_1[{ XN  ,n(N-1) N n 
vV) = HA yê PEER 2 Evo] (2.35) 


= RL y 27 _ 
= nN 2, v=) RS (2.36) 
Consequently, V(¥) in sampling without replacement is only (N—n)/(N—1) 
times its value in sampling with replacement, If instead of y the mean f4 of the 
different or distinct units in the sample is used as an estimate when sampling is with 
replacement, Murthy (1967) has shown that the leading term in the average 
variance of f4 is (1— f/2)S?/n, following work by Basu (1958) and Des Raj and 
- Khamis (1958). In some applications the cost of measuring the distinct units in the 
sarhple may be predominating, so that the cost of the sample is proportional to the 
number of distinct units. In this situation, Seth and J. N. K. Rao ( 1964) showed 
that for given average cost, V(j) in sampling without replacement is less than 
V(fa) in sampling with replacement, They also prove the more general result that 
if Ja' = f(v) ¥a/ Ef (v), where v is the number of distinct units in the sample and f(v) 
is a function of v, then V(7) <.V(f,4') if S?< NY. ? a condition satisfied by nearly all 
populations encountered in sample surveys. 


2.11 ESTIMATION OF A RATIO 


Frequently the quantity that is to be estimated from a simple random sample 
is the ratio of two variables both of which vary from unit to unit. In a household 
survey examples are the average number of suits of clothes per adult male, the 
average expenditure on cosmetics per adult female, and the average number of 
hours per week spent watching television per child aged 10 to 15. In order to 
estimate the first of these items, we would record for the ith household (i= 
1, 2,...,”) the number of adult males x; who live there and the total number of 
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suits y; that they possess. The population parameter to be estimated is the ratio 


N 
total number of suits x yi 


~ total number of adult males X (2.37) 
Èx 
1 
The corresponding sample estimate is 
A x yi ï 
RE re (2.38) 
bs 


Examples of this kind occur frequently when the sampling unit (the household) 
comprises a group or cluster of elements (adult males) and our interest is in the 
population mean per element. Ratios also appear in many other applications, for 
example, the ratio of loans for building purposes to total loans in a bank or the 
ratio of acres of wheat to total acres on a farm. 

The sampling distribution of R is more complicated than that of y because both 
the numerator y and the denominator < vary from sample to sample. In small 
samples the distribution of R is skew and Ê is usually a slightly biased estimate of 
R. In large samples the distribution of É tends to normality and the bias becomes 
negligible. The following approximate result will serve for most purposes: the 
distribution of-R is studied in more detail in Chapter 6. 


Theorem 2.5. If variates y;, x; are measured on each unit of a simple random 
sample of size n, assumed large, the MSE and variance of R =y/xX are each 
approximately 


N 
È (y;-Rx,)? 
: ia A a 
MSE(R) = V(R) = —/ i= 
(R) = V(R) i= Na (2.39) 
where R = Y/X is the ratio of the Population means and f=n/N. 
Proof. 
R-R=J_paJERE (2.40) 
ž x 


If n is large, £ should not differ greatly from X. In order to avoid having to work 
Out the distribution of the ratio of two random variables (f — R7) and ¥, we replace 
* by X in the denominator of (2.40) as an approximation. This gives 
j-Rī 
R-R= 2.41 
Ra (2.41) 
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Now average over all simple random samples of size n. 
RoR) eee 2.42 
E(R-R) ¥ x 0 (2.42) 


since R = Y/X. This shows that to the order of approximation used here R is an 
unbiased estimate of R. 


From (2.41) we also obtain the result 


MSE(R) = E(R - R? = BEG Re? (2.43) 


The quantity j—Rz is the sample mean of the variate d;=y; —Rxi, whose 
population mean D = Y—RX =0. Hence we can find V(R) by applying theorem 
2.2 for the variance of the mean of a simple random sample to the variate d; and 
dividing by X°. This gives F 


2 
VR) = gE VRE =z EN (2.44) 
E-D? 2 -Rx 
E Se i Berea i) 
nX (N-1) nX  N-1 (2145) 


This completes the proof. 

The way in which theorem 2.5 was proved is worth noting. It was shown that the 
formula in theorem 2.2 for the variance of the sample mean J gives the formula for 
the approximate variance of the ratio y/%, if the variate y; is replaced by the 
variate (y; —Rx;)/X. The same result, or its natural extension, holds also in more 
complex sampling situations and is used frequently later in this book. 

As a sample estimate of 


N 
È (i -Rx;)? 
i=1 

N-1 


it is customary to take 
2, (67i Rx)? 
n-1 


This estimate can be shown to have a bias of order 1 /n. 
For the estimated standard error of R, this gives 


»_vict [EGF 
sR- oh [ROR (2.46) 


nl 
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If X is not known, the sample estimate < is substituted in the denominator. 
One way to compute s(R) is to express it as 


-— —— 
E oe | RST OE 2.47 
Vnă n-i (2.47) 

Example. Table 2.3 shows the number of persons (x), the weekly family income (x2). 
and the weekly expenditure on food (y) in a simple random sample of 33 low-income 
families. Since the sample is small, the data are intended only to illustrate the calculations. 

Estimate from the sample (a) the mean weekly expenditure on food per family, (b) the 
mean weekly expenditure on food per person, and (c) the percentage of the income that is 
spent on food. Compute the standard errors of these estimates. 

Weekly Expenditure on Food per Family. This is the ordinary sample mean 


By theorem 2.2 (ignoring the fpc), its standard error is 


1 Eoy 1 / Gy)? 
aa pare aaa Saas 


n-1 n( 


1 — 
=———— V 28,224 — (907.2)*/33 = $1.76 
v(33)(32) N a 


(The uncorrected sum of squares 28,224 is given underneath Table 2.3.) 
Weekly Expenditure on Food per Person. Since the size of family varies, the estimate is a 
ratio of two variables, 


R, =£ =——— = $7.38 per person 
The sums of squares and products needed to compute S(R) by (2.47) are found under 
Table 2.3. We need in addition 
2R, = 14.7512, 7=54.3996,  ¥,=3.7273 


Extra decimals are carried in R,, 2R,, R? to preserve accuracy. 
Hence, from (2.47), 


P (28,224) —(14,7512)(595.5) + (54.3996)(533) 
L ¥33(3.7273) 32 
= $0.534 


Percentage of Income Spent on Food. This again is a ratio of two variables 
Ly _ (100)(907.2) 
Dx. 2394 


By (2.47) the reader may verify that the standard error is 2.38%. 


R,=100 =37.9% 
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TABLE 2.3 
Size, WEEKLY INCOME, AND Foop Cost oF 33 FAMILIES 
Food Food 
Famiy Size ` Income Cost Family Size Income Cost 
Number wy Ta y Number z, T Yy 
1 2 62 14.3 18 4 83 36.0 
2 3 62 20.8 19 2 85 20.6 
3 3- 87 22.7 20 4 73) 27.7 
4 5 65 30.5 21 2 66 25.9 
5 4 58 41.2 22 5 S8i-g 23.3 
6 7 92 28.2 23 3 77 39.8 
T 2 88 24.2 24 4 69 16.8 
8 4 79 30.0 25 7 65 37.8 
9 2 83 24.2 26 3 77 34.8 
10 5 62 44.4 27 3 69 28.7 
11 3 63 13.4 28 6 95 63.0 
12 6 62 19.8 29 2 77 19.5 
13 4 60 29.4 30 2 69 21.6 
14 4 75 27.1 31 6 69 18.2 
15 2 90 22.2 32 4 67 20.1 
16 5 75 37.7 33 2 63 20.7 
17 3 69 22.6 
Total 123 2394 907.2 
Yx°=533, Yx.7=177,254,  Yy?=28,224 


È xiy =3595.5, È my =66,678 


2.12 ESTIMATES OF MEANS OVER SUBPOPULATIONS 


In many surveys, estimates are made for each of a number of classes into which 
the population is subdivided. In a household survey separate estimates might be 
wanted for families with 0, 1, 2,... children, for Owners and renters, or for 


. The term domains of study has been given 


(MEM 2) ens nj) are the measurem 
for the jth domain is estimated by 


OT. (2.48) 
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At first sight y; seems to be a ratio estimate as in section 2.11. Although n is 
fixed, n; will vary from one sample of size n to another. The complication of a ratio 
estimate can be avoided by considering the distribution of f; over samples in 
which both n and n; are fixed. We assume n; >0. 

In the totality of samples with given n and n; the probability that any specific set 
of n; units from the N; units in domain j is drawn is 


TO ia ih 


NEN Gen Ý niGy N Gy 
Since each specific set of n; units from domain j can appear with all selections of 
(n —n;) units from the (N —n;) that are not in domain j, the numerator above is the 
number of samples containing a specified set of n;, and the denominator is the total 
number of samples. It follows that theorems 2.1, 2.2, and 2.4 apply to the y;x if we 
put n; for n and N; for N. 


From theorem 2.1: J; isan unbiased estimate of Y (2.49) 
From theorem 2.2: the standard error of j; is 1- (n;/N;) (2.50) 
nj 
where 
a A O N 
s= 5 Si 
ENa (2.51) 
From theorem 2.4: An estimate of the standard error of y; is 
LSN) (2:52) 
—(nJN; 4 
Vn; i] 1 
where 
n _>3)2 
72 5 Quay" (2.53) 
k=1 1y—1 


If the value of N; is not known, the quantity n/N may be used in place of n;/ N; 
when computing the fpc. (With simple random sampling, n;/N; is an unbiased 
estimate of n/N.) 


2.13 ESTIMATES OF TOTALS OVER SUBPOPULA1IONS 


In a firm’s list of accounts receivable, in which some accounts have been paid 
and some not, we might wish to estimate by a sample the total dollar amount of 
unpaid bills. If N; (the number of unpaid bills in the population) is known, there is 
no problem. The sample estimate is N,¥, and its conditional standard error is N; 
times expression (2.50). 
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Alternatively, if the total amount receivable in the list is known, a ratio estimate 
can be used. The sample gives an estimate of the ratio (total amount of unpaid 
bills)/(total amount of all bills). This is multiplied by the known total amount 
receivable in the list. 

If neither N; nor the total receivables is known, these estimates cannot be made. 

- Instead, we multiply the sample total of the y’s over units falling in the jth domain 
by the raising factor N/n. This gives the estimate 


Nw i 
Fey sin (2.54) 
n k=l 


We will show that Y; is unbiased and obtain its standard error over repeated 
samples of size n. The device of keeping n; fixed as well as n does not help in this 
problem. z Shb i 

In presenting the proof we revert to the original notation, in which y; is the 
measurement on the ith unit in the population. Define for every unit in the 
population a new variate y,’, where 


f f if the unit is in the jth domain, 
yie 


0 otherwise 


> The population total of the y;' is 


N 
y= Y y=Y, (2.55) 


i=1 jth dom 


In a simple random sample of size n, y;' = y; for each of the n; units that lie in the 
jth domain; y,’=0 for each of the remaining n—n, units. If y’ is the ordinary 
sample mean of the y,’, the quantity 


_ Na ,N% 5 
NEI E y= Ý; (2.56) 
i=l M k=1 


This result shows that the estimate Ý, as defined in equation (2.54) is N times 
the sample mean of the y,’. 

In repeated samples of size n we can clearly apply theorems 2.1, 2.2, and 2.4 to 
the variates y;'. These show that A is an unbiased estimate of Y; with standard 
error 


.. NS 
o(Ê) =—-V1=(@/N) (2.57) 


where S’ is the population standard deviation of the y,’. In order to compute S’, we 
regard the population as consisting of the N;-values y; that are in the jth domain 
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and of N—N; zero values. Thus 


os 1 ¥? 
5 reaal 5 a -) (2.58) 


From theorem 2.4 a sample estimate of the standard error of Ŷ is 
AA NSI 
sS 1- (n/N) (2.59) 
vn 


In computing s’, any unit not in the jth domain is given a zero value. Some students 
seem to have a psychological objection to doing this, but the method is sound. 

The methods of this and the preceding section also apply to surveysin which the 
frame used contains units that do not belong to thè population as it has been 
defined. An example illustrates this application. 


Example. From a list of 2422 minor household expenditures a simple random sample 
of 180 items was drawn in order to estimate the total spent for operation of the household. 
Certain types of expenditure (on clothing and car upkeep) were not considered relevant, Of 
the 180 sample items, 152 were relevant. The sum and uncorrected sum of squares of the 
relevant amounts (in dollars) were as follows. 

Lyi =343.5, È yi? = 1491.38 


Estimate the total expenditure for household operation and give the standard error of the 
estimate. 


Ý, INg p = 2A221843.5) T $4622 


n im 180 
From (2.59) 


Ns! 
8) = I=IN) 
n 


In computing s’ we regard our sample of 180 items as having 28 zeros. Hence 


si? 1 [z y2—& x) 


~ 79) 180 
nibh [ (343.5)77_ 
azg 1491.38 Oe |=4.670 


Finally, 


4.670 180 
s= 0422y ar (1 -sas) = $375 


The estimate is not precise, its coefficient of variation 375/4622 being about 8%. 


In this example expenditures on car upkeep and clothing were excluded as not 
relevant and therefore were scored as zeros in the sample. In some applications it 


` 
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is known in advance that certain units in the population contribute nothing to the 
total that is being estimated. For instance, in a survey of stores to estimate total 
sales of luggage, some stores do not handle luggage; certain area sampling units 
for farm studies contain no farms. Sometimes it is possible, by expenditure of 
effort, to identify and count the units that contribute nothing, so that in our 
notation (N —N;), hence Nj, is known. 

Consequently it is worth examining by how much V(¥;) is reduced when N; is 
known. If N; is not known, (2.57) gives 


If Y; and S; are the mean and standard deviation in the domain of interest (i.e., 
among the nonzero units) the reader may verify that 


(N-1)S? =/N;- 187+, ¥}(1 -%) (2.60) 
Since terms in 1/N; and 1/N are nearly always negligible, 
S? = PS? + PQ; Y? (2.61) 
where P; =N;/N and Q; = 1—P,. This gives 
vÊ) = X Psi+Pay(t -2) (2.62) 


If nonzero units are identified, we draw a sample of size n; from them. The 
estimate of the domain total is N;; with variance 
N? nj\ _N? n 
VIN y) =e At -#) -5 2 Al a) fs 
w 5 (I-x) =a PSI (2.63) 
The comparable variances are (2.62) and (2.63). In (2.62) the average number of 
nonzero units in the sample of size n is nP;. If we take n; = nP; in (2.63), so that the 


number of nonzeros to be measured is about the same with both methods, (2.63) 
becomes 


AEN? n 
VN) = FeS) -2) (2.64) 
The ratio of the variances (2.64) to (2.62) is 
VNiknown) SP 
V(N; not known) S/+Q,¥? C?+Q, (2.65) 


where C; = S;/ Ý, is the coefficient of variation among the nonzeros. As might be 
expected, the reduction in variance due to a knowledge of N; is greater when the 
proportion of zero units is large and when y; varies relatively little among the 


nonzero units. For further study of this problem, see Jessen and. Houseman 
(1944). 
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2.14 COMPARISONS BETWEEN DOMAIN MEANS 


Let Jj, Ją be the sample means in the jth and kth of a set of domains into which 
the units in a simple random sample are classified. The variance of their difference 
is 

V(¥;— Fe) = Vi) + Ve) (2.66) 
This formula applies also to the difference between two ratios R; and Rx. 
_ One point should be noted. It is seldom of scientific interest to ask whether 
Ý, = Y;, because these means would not be exactly equal in a finite population, 
except by a rare chance, even if the data in both domains were drawn at random 
from the same infinite population. Instead, we test the null hypothesis that the two 
domains were drawn from infinite populations having the same mean. Conse- 
quently we omit the fpc when computing V(¥;) and V(j;,), using the formula 
S? Sic 
Vj- Fe) = +— 2.67 
(Fi — 7k) ane (2.67) 

A formula similar to (2.67) is obtained for tests of significance if one frames the 
question: Could the samples from the two domains have been drawn at random 
from the same finite population? 

Under this null hypothesis it may be proved (see exercise 2.16) that 


el 
V(F;— Fx) =S} (+=) 
(Fi Fk) = Six n Mk 
where S} is the variance of the finite population consisting of the combined 
domains. 


2.15 VALIDITY OF THE NORMAL APPROXIMATION 


Confidence that the normal approximation is adequate in most practical 
situations comes froma variety of sources. In the theory of probability much study 
has been made of the distribution of means of random samples. It has been proved 
that for any population that has a finite standard deviation the distribution of the 
sample mean tends to normality as n increases (see, e.g., Feller, 1957). This work 
relates to infinite populations. ` 

For sampling without replacement from finite populations, Hájek (1960) has 
given necessary and sufficient conditions under which the distribution of the 
sample mean tends to normality, following work by Erdös and Rényi (1959) and 
Madow (1948). Hájek assumes a sequence of values n,, N, tending to infinity in 
such a way that (N,—n,) also tends to infinity. The measurements in the vth 
population are denoted by y», (i = 1, 2,..., N,). For this population, let S,,be the 
set of units in the population for which 


lyi- Y.l > rvn, Af) Sy 
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where Y,, S,, f, are the population mean, s.d., and fpc, and 7 isa number >0. Then 
the Lindeberg-type condition 


is necessary and sufficient to ensure that y, tends to normality with the mean and 
variance given in theorems 2.1 and 2.2. 

This imposing body of knowledge leaves something to be desired. It is not easy 
to answer the direct question: “For this population, how large must n be so that 
the normai approximation is accurate enough?” Non-normal distributions vary 
greatly both in the nature and in the degree of their departure from normality. The 
distributions of many types of economic enterprise (stores, chicken farms, towns) 
exhibit a.marked positive skewness, with a few large units and many small units, 


The same kind of skewness is displayed by some biological populations (e.g., the 
number of rats or flies per city block), 
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Fig. 2.1. Frequency distribution of sizes of 196 United Siates Cities in 1920. 
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As an illustration of a positively skewed distribution, Fig. 2.1 shows the 
frequency distribution of the numbers of inhabitants in 196 large United States 
cities in 1920. (The four largest cities, New York, Chicago, Philadelphia, and 
Detroit, were omitted. Their inclusion would extend the horizontal scale to more 
than five times the length shown and would, of course, greatly accentuate the 
skewness.) Figure 2.2 shows the frequency distribution of the total number of 


50 


> 
o 


Frequency 


Millions 


Fig. 2.2 Frequency distribution of totals of 200 simple random samples with n=49. 


inhabitants in each of 200 simple random samples, with n = 49, drawn from this 
population. The distribution of the sample totals, and likewise of the means, is 
much more similar to a normal curve but still displays some positive skewness. 

From statistical theory and from the results of sampling experiments on skewed 
populations, some statements can be made about what usually happens to 
confidence probabilities when we sample from positively skew populations, as 
follows: 


1. The frequency with which the assertion 
J— 1.96s; < Y<f+1.96s; 


is wrong is usually higher than 5%. 
2. The frequency with which 


Y>F+1.96s; 


is greater than 2.5%. 
3. The frequency with which 


Y<j- 1.96s; 


is less than 2.5%. 


As an illustration, consider a variate y that is essentially binomially distributed, 
So that the exact distribution of ¥ can be read from the binomial tables. The variate 
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y takes only fwo values—the value h with probability P and the value 0 with 
probability Q. The population mean is Y= Ph. A simple random sample of size n 
shows a units that have the value A and n —a units that have the value 0. For the 
sample, 


Ly=ah, y= 


Hence 95% normal confidence limits for Y are estimated as 


F+1.96s, = + 1.96/22] (2.68) 
n n-1 


Letn = 400, P=0.1. Then Y=0.1h. By trial we find that if a = 29 in expression 
(2.68) the upper confidence limit is 39.18h/400 = 0.098h, whereas a =30 gives 
40.34h/400 = 0.101h. Hence any value of a <29 gives an upper confidence limit 
that is too low. Similarly we find that if a = 54 the lower limit is too high. 

The variate a follows the binomial distribution with n = 400, P=0.1. The 
tables (Harvard Computation Laboratory, 1955) show that 


Pr (stated upper limit too low) = Pr (a $29) =0,0357 
Pr (stated lower limit too high = Pr (a =54) =0,0217 


Pr (confidence statement wrong) = 0.0574 


The total probability of being wrong is not far from 0.05. In more than 60% of 
the wrong statements, the true mean is higher than the stated upper limit. 
There is no safe general rule as to how large n must be for use of the normal 


n>25G,? (2.69) 
where G; is Fisher’s measure of skewness (Fisher, 1932), 
PEUS DEN A3 

G,= i NES Pa =Y) (2.70) 


This rule is designed so that a 95% confidence probability statement will be 
wrong not more than 6% of the time. It is derived mathematically by assuming 
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that any disturbance due to moments of the distribution of y higher than the third 
is negligible. The rule attempts to control only the total frequency of wrong 
statements, ignoring the direction of the error of estimate. 

By calculating G;, or an estimate, for a specific population, we can obtain a 
rough idea of the sample size needed for application of the normal approximation 
to compute confidence limits. The result should be checked by sampling experi- 
ments whenever possible. 


TABLE 2.4 
FREQUENCY DISTRIBUTION OF ACRES IN’ CROPS ON 556 FARMS 
Class Coded 


Intervals Scale Frequency fii fuè fue 
(acres) Yi fi 
0-29 =0.9 47 —42.3 38.1 —34.3 
30-63 0 143 0 0 0 
64-97 1 154 154 154 154 
98-131 2 82 164 328 656 
132-165 3 62 186 558 1,674 
166-199 4 33 132 528 2,112 
200-233 5 13 65 325 1,625 
234-267 6 6 36 216 1,296 
268-301 Uf 4 28 196 1,372 
302-335 8 6 48 384 3,072 
336-369 9 2 18 162 1,458 
370-403 10 0 0 0 0 
404-437 11 2 22 242 2,662 
438-471 12 0 0 0 0 
472-505 13 2 26 338 4,394 
Totals 556 836.7 3,469.1 20,440.7 
836.7 
E(y)= oo -50486 
3469.1 5 
Ely = = 
(y) 356 6.23939 
20,440.7 
E(y3)=— = 
(yò) 556 36.76385 


= E(y?)— Y° =3.97479 
k3=E(y;— Y= E(y?)- 3E(y7)¥+2¥° 
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Example. The data in Table 2.4 show the numbers of acres devoted to crops on 556 
farms in Seneca County, New York. The data come from a series of studies by West (1951), 
who drew repeated samples of size 100 from this population and examined the frequency 
distributions of ¥, s, and Student’s ¢ for several items of interest in farm management 


surveys. j 
The computation of G, is shown under the table. The computations are made ona coded 


scale, and, since G; is a pure number, there is no need to return to the original scale. Note 
that the first class-interval was slightly different from the others. 
_ Since G, = 1.9, we take as a suggested minimum n 
n=(25)(1.9)? =90 = 


For samples of size 100, West found with this item (acres in crops) that neither the 
distribution of f nor that of Student's ¢ differed significantly from the corresponding 
theoretical normal distributions. 


Good sampling practice tends to make the normal approximation more valid. 
Failure of the normal approximation occurs mostly when the population contains 
some extreme individuals who dominate the sample average when they are 
present. However, these extremes also have a much more serious effect of 
increasing the variance of the sample and decreasing the precision. Consequently, 
it is wise to segregate them and make separate plans for coping with them, perhaps 
by taking a complete enumeration of them if they are not numerous. This removal 
of the extremes from the main body of the population reduces the skewness and 
improves the normal approximation. This technique is an example of stratified 
sampling, which is discussed in Chapter 5. 


2.16 LINEAR ESTIMATORS OF THE POPULATION MEAN 


Under simple random sampling, is the sample mean J the best estimator of Y? 
This question has naturally attracted a good deal of work. The answer depends on 
the set of competitors to 7 that are allowed and on the definition of “best.” For 
sampling, the units in the population are usually numbered in some way from 1 to 


N. These numbers that identify the units are often called the /abels attached to the 
units, i i 


s Early results proved by Horvitz and Thompson (1952) for linear estimators in 
simple random sampling are as follows. If any y; always receives the same weight 
w; whenever unit labeled i is drawn, the sample mean J is the only unbiased 


estimator of Y of the form 3 wiy:. Since every unit appears in a fraction n/N of 
n N 
simple random samples, a(S wo) = n( w) / N= only if every w; = 1/n. If, 


alternatively, the weight depends only on the order in which the unit is drawn into 
the sample, then y has minimum variance among unbiased linear estimators of the 
n 


form 2 WaYa), Where yia) is the y-value on the unit that turns upat the dth draw. 
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A wider class of competitors is the set È w;,y;, where w; may depend on the 
other units that fall into the sample as well as on /. Godambe (1955) showed thatin 
this class no unbiased estimate of Y exists with minimum variance for all 
populations. 3 

Further properties of y have been developed by Hartley and J. N. K. Rao (1968, 
1969), Royall (1968), and C. R. Rao (1971). In any finite population there will be 
at most T=N distinct values of the y;. In the sample, let there be n, values equal to 
Y» Where } n, =n. Hartley and Rao (1968) show that f has minimum variance 
among unbiased estimators of Y that are functions only of the n, and Ye For 
random sampling with replacement, they show that the mean of the distinct values 
in the sample is the maximum likelihood estimator of Y, although it does not have 
minimum variance in all populations. 

C. R. Rao,(1971), following work by Kempthorne (1969), considered unbiased 
estimators Y=) wjsy; discussed by Godambe. In order to represent the case in 
which the labe]s / supply no information about the values y, he calculated the 
average of V(Y) over all N! permutations of the labels attached to the values, and 
showed that over these permutations, f has minimum average variance. Royall 
(1970b) has given a more general result. 

Godambe’s (1955) work has stimulated numerous investigations on sampling 
design and estimation, including. topics such as criteria by which to judge 
estimators, the role of maximum likelihood, the use of auxiliary information that 
the labels may carry about the y; Bayesian estimators, and methods of estimation 
when assumptions can be made about the frequency distribution of the y; The 
influence of this work on sampling practice has been limited thus far, but should 
steadily increase. Some reference to it will be made from time to.time. For 
reviews, see J. N. K. Rao* (1975a) and Smith (1976). 


EXERCISES 


2.1 Ina population with N =6 the values of y; are 8, 3, 1, 11, 4, and 7. Calculate the 
sample mean J for all possible simple random samples of size 2. Verify that ¥ is an unbiased 
estimate of Y and that its variance is as given in theorem 2.2. 


2.2 For the same population, calculate s$? for all simple random samples of size 3 and 
verify that E(s?) = $?, 

2.3 : If random samples of size 2 are drawn with replacement from this population, show 
by finding all possible samples that V(j) satifies the equation 


2.4 A simple random sample of 30 households was drawn from a city area containin 
14,548 households. The numbers of persons per household in the sample were as follows : 


5,6,3,3,2,3,3,3,4,4,3,2,7,4,3,5,4,4,3,3,4,3,3,1,2,4,3, 4,2, 4 


* Henceforth in this book the surname Rao will refer to J. N. K. Rao unless, otherwise noted 
. ed, 
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Estimate the total number of people in the area and compute the probability that this 
estimate is within +10% of the true value. 

2.5 In a study of the possible use of sampling to cut down the work in taking inventory 
in a stock room, a count is made of the value of the articles on each of 36 shelves in the 
room. The values to the nearest dollar are as follows. 


29, 38, 42, 44, 45, 47, 51, 53, 53, 54, 56, 56, 56, 58, 58, 59, 60, 60, 
60, 60, 61, 61, 61, 62, 64, 65, 65, 67, 67, 68, 69, 71, 74, 77, 82, 85. 
The estimate of total value made from a sample is to be correct within $200, apart froma 


1 in 20 chance. An advisor suggests that a simple random sample of 12 shelves will meet the 
requirements. Do you agree? 


Ly =2138, Ly? = 131,682 k 


2.6 After the sample in Table 2.2 (p. 28) was taken, the number of completely filled 
sheets (with 42 signatures each) was counted and found to be 326, Use this information to 
make an improved estimate of the total number of signatures and find the standard error of 
your estimate. 


2.7 Froma list of 468 small 2-year colleges a simple random sample of 100 colleges was 


drawn. The sample contained 54 public and 46 private colleges, Data for number of 
students (y) and number of teachers (x) are shown below. 


n Ey) L(x) 
Public 54 31,281 2,024 
Private 46 13,707 1,075 
L(y’) È (yx) E(x?) 
Public 29,881,219 1,729,349 111,090 
Private 6,366,785 431,041 33,119 


(a) For each type of college in the population, estimate the ratio (number of 
tudents)/(numbegof teachers). (b) Compute the standard errors of your estimates, (c) For 


the public colleges, find 90% confidence limits for the student/teacher ratio in the whole 
population. 


2.8 Inthe preceding example test at the 5% level whether the st ioi 
2. Í r udent/t atio 1S 
significantly different in the two types of colleges. eer 


2.9 For the public colleges, estimate the total number of teachers (a) gi 
i ; a) given that the 
total number of public colleges in the population is 251, (b) without AT figure. In 
each case compute the standard error of your estimate. 
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FREQUENCY DISTRIBUTION OF CITY SIZES 


Size Class Size Class Size Class 
(1000's) f (1000’s) IÈ (1000's) f 
50-100 105 550-600 2 mete stil, 
100-150 36 600-650 1 1500-1550 1 
150-200 13 650-700 2 abed be, 
200-250 6 700-750 0 1600-1650 1 
250-300 7 750-800 1 Bic 8 bose 
300-350 8 800-850 1 1900-1950 1 
350-400 4 850-900 2 Sore doa 
400-450 1 900-950 0 3350-3400 1 
450-500 3 950-1000 0 Shae died 
500-550 0 1000-1050 0 7450-7500 1 


Gaps in the intervals are indicated by .... 


2.11 Calculate the coefficient of skewness G, for the original population and the 
population remaining after removing (a) the five largest cities, (b) the nine largest cities. 


2.12 A small survey is to be taken to compare home-owners with renters. In the 
. Population about 75% are owners, 25% are renters. For one item the variance is thought to 
be about 15 for both owners and renters. The standard error of the difference between the 
two domain means is not to exceed 1. How large a sample is needed (a) if owners and 
renters can be identified in advance of drawing the sample, (b) if not? (An approximate 
answer will do in (b); an exact discussion requires binomial tables.) 

2.13 A simple random sample of size 3 is drawn from a population of size N with 
replacement. Show that the probabilities that the sample contains 1, 2, and 3 different units 
(for example, aaa, aab, abc, respectively) are 


1 3(N-1) (N-1)(N-2) 
Pisa P= NN? e N? 


As an estimate of Y we take y’, the unweighted mean over the different units in the 
sample. Show that the average variance of y’ is 


=n _(2N—1)(N-1)S?2 , 
vV) = —L ge 2 
0) aa = (1-/2)S?/3 


One way to do this is to show that 


‘N-1 N-2 N-3 ) 
N Ree Pita P, 

Hence show that V(j’)< V(¥), where y is the ordinary mean of the n observations 
in the sample. The result that V(¥’)< V(y) for any n>2 was proved by Des Raj and 
Khamis (1958). 

2.14 Two dentists A and B make a survey of the state of the teeth of 200 children ina 
village. Dr. A selects a simple random sample of 20 children and counts the number of 
decayed teeth for each child, with the following results, 


vig) =s( 
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Number of decayed 0 1 2 3 4 5 6 7 8 9 10 
teeth/child 
Number of children 8 4 2211000141 


Dr. B, using the same dental techniques, examines all 200 children, recording merely 
those who have no decayed teeth. He finds 60 children with no decayed teeth. 

Estimate the total number of decayed teeth in the village children, (a) using A’s results 
only, (b) using both A’s and B’s results. (c) Are the estimates unbiased? (d) Which 
estimate do you expect to be more precise? 

2.15 A company intends to interview a simple random sample of employees who have 
been with it more than 5 years. The company has $1000 to spend, and each interview costs 
$10. There is no separate list of employees with more than S years service, but a list can be 
compiled from the files at a cost of $200. The company can either (a) compile the list and 
interview a simple random sample drawn from the eligible employees or (b) draw a simple 
random sample of all employees, interviewing only those eligible. The cost of rejecting 
those not eligible in the sample is assumed negligible. 

Show that for estimating a total over the population of eligible employees, plan (a) gives 
a smaller variance than plan (b) only if C; < 20, where Ç; is the coefficient of variation of 
the item among eligible employees and Q, is the proportion of noneligibles in the company. 

Ignore the fpc. 

2.16 A simple random sample of size n =n,+n2 with mean J is drawn from a finite 
population, and a simple random subsample of size n, is drawn from it with mean ¥,. Show 
that (a) V(¥i—¥2) = S7[(1/n,)+(1/n2)]. where ï, is the mean of the remaining n units in 
the sample, (b) V(¥:—¥)=S*[(1/n:)—(1/n)], (c) Cov {¥, ,-F}=0. Repeated sampling 
implies repetition of the drawing of both the sample and the subsample. 

2.17 . The number of distinct simple random samples of size n is of course N!/n!(N— 
n)!. There has been some interest in finding smaller sets of samples of size n that have the 
same properties as the set of simple random samples. One set is that of balanced incomplete 
block (bib) designs. These are samples of n distinct units out of N such that (i) every unit 
appears in the same number (r) of samples, (ii) every pair of units appears together in A 
samples, 

Verify that A = r(n —1)/(N—1) and that the number of distinct samples in the set is rN/n. 
Over the set of bib samples, prove in the usual notation that if f is the mean of a sample, (a) 
vi =(1-f)S?/n and (b) v(¥)=(1-f) F (y;-f)*/n(n—1) is an unbiased estimate of 

ï): 

Note. There is no general method for finding the smallest r for which a bib can be 
constructed. Sometimes the smallest known r provides N!/n!(N—n)! samples, bringing us 
back to simple random samples, But for N=91, n = 10, the smallest bib set has 91 samples 
as against over 6 United States trillion SRS. Avadhani and Sukhatme (1973) have shown 
how bib designs may be used in attempting to reduce travel costs between sampling units. 


2.18 The following is an illustration by Royall (1968) of the fact that in simple random 
sampling the sample mean y does not have uniformly minimum variance in the class of 


estimators of the form È wy, considered by Godambe (1955), where the weight w; may 
depend on the other units that fall in the sample. For N = 3, n =2, consider the estimator 


Yi2=4y1 +yz; Yis=4y, +3y,; ¥os=hy.+hy, 
a, z 3 
where Y; is the estimator for the sample that has units (i, į). Prove Royall’s results that Yy 


is unbiased and that V( y) < VF) if ys(3y2—3y,—y3) > 0. The illustration is taken from an 
earlier example by Roy and Chakravarti (1960). 


-dd A 
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2.19 This exercise is another example of estimators geared to particular features of 
populations. After the decision to take a simple random sample had been made, it was 
realized that y, would be unusually low and yy would be unusually high. For this situation, 
Sarndal (1972) examined the following unbiased estimator of Y. 


Ys=Fre if the sample contains y, but not yy 
=C¢ if the sample contains yx but not y, 


< 


for all other samples 


<1 


where c-is a constant. Prove Sarndal’s result that Ý; is unbiased with 


v=- yx=yı=ne)] 


2c ( 
(N-1) 
so that V(Ys)< V(¥) if 0<c <(yy—yi)/n. A 

2.20 For a population with N = 8 and values y; = 1, 4, 5, 5, 6, 6, 8, 13, show that with 
n=4, V(¥)=1.5, while V(Ys)=0.214 when c =1.5 (its best value), and 0.357 whenc=1 
or 2. 

Given the information in exercise 2.19, an alternative sampling plan is to include both y, 
and ys in every sample, drawing a simple random sample of size 2 from y2,..-,Y7, with 
mean f+. The estimate of Y is 


Ê, = (y1 + 672+ ys)/8 


Show that Y,, is unbiased, with variance 9V(j2)/16. For this population, show that 
V(Y,,) = 0.350. This estimator is an example of stratified sampling (Chapter 5) with three 
Strata: yı; Y2- -< Y7: and yx. 


CHAPTER 3 
Sampling Proportions 
and Percentages 


3.1 QUALITATIVE CHARACTERISTICS 


Sometimes we wish to estimate the total number, the proportion, or the 
percentage of units in the population that Possess some characteristic or attribute 
or fall into some defined class. Many of the results regularly published from 
censuses or surveys are of this form, for example, numbers of unemployed 
persons, the percentage of the population that is native-born: The classification 
may be introduced directly into the questionnaire, as in questions that are 
answered by a simple “yes” or “no.” In other cases the original measurements are 
more or less continuous, and the classification is introduced in the tabulation of 
results. Thus we may record the respondents’ ages to the nearest year but publish 
‘the percentage of the population aged 60 and over. 

Notation. We suppose that every unit in the population falls into one of the 
two classes C and C’. The notation is as follows: 


Number of units in C in Proportion of units in C in 
Population Sample Population Sample 
A a P=A/N p=a/n 


The sample estimate of P is P, and the sample estimate of A is Np or Na/n. 
In statistical work the binomial distribution is often applied to estimates like a 
and p. As will be seen, the correct distribution for finite populations is the 
hypergeometric, although the binomial is usually a satisfactory approximation. 


3.2 VARIANCES OF THE SAMPLE ESTIMATES 


By means of a simple device it is possible to apply the theorems established in 
Chapter 2 to this situation. For any unit in the sample or population, define y; as 1 
if the unitis in Candas Oifitisin C', For this population of values yj, itis clear that 

N 


Y=} y =A (3.1) 
1 
50 


r; 
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See 3 
NN G2) 
Also, for the sample, 
Zy a 
Ypres ne 4 (3.3) 


Consequently the problem of estimating A and P can be regarded as that of 
estimating the total and mean of a population in which every y; is either 1 or 0. In 
order to use the theorems in Chapter 2, we first express $° and s? in terms of P and 
p. Note that 


Hence 
N N a 
Lm- Y Ly?-NY? 
2A ES 
Soe Wat -N-1 
a Oe el 
= ya NP -NP*) => PO (3.4) 
where Q =1-P. Similarly 
LOi-3) 
z 3.5 
z n-1 nate ( 2) 


Application of theorems 2.1, 2.2, and 2.4 to this population gives the following 
results for simple random sampling of the units that are being classified. 


Theorem 3.1. The sample proportion p= ajn is an unbiased estimate of the 
population proportion P = A/N. 


Theorem 3.2. The variance of pis 


Vip)=E(p- py = (N) = FON) (6) 


using (3.4). 


` Corollary 1. If p and P are the sample and population percentages, respec- 
tively, falling into class C, (3.6) continues to hold for the variance of p. 
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Corollary 2. The variance of A = Np, the estimated total number of units in 
class C, is 


V(A)= 


N?PQ = - S 
3:7. 
n \N-1 END 
Theorem 3.3. An unbiased estimate of the variance of p, derived from the 
sample, is 
ANEN 
m-n”! 
Proof. In the corollary of theorem 2.4 it was shown that for a variate y; an 
unbiased estimate of the variance of the sample mean ¥ is 


v(p)=s2 (3.8) 


nas (NEn) 
CEAN. (3.9) 
For proportions, P takes the place of 7, and in (3.5) we showed that 
2_ 7 
s= gPa (3.10) 
Hence 
N-n 
2 
vp) =s} = —— 
P= =N” (3.11) 


It follows that if N is very large relative to n, so that the fpc is negligible, an 
unbiased estimate of the variance of p is 


Pq 
n-1 


The result may appear puzzling to some readers, since the expression pq/n is 
almost invariably used in practice for the estimated variance. The fact is that pal” 
is not unbiased even with an infinite population. 


Corollary. An unbiased estimate of the variance of A = Np, the estimated 
total number of units in class C in the population, is 
N(N=n) 


v(Â)= s =— P (3.12) 


Example. From a list of 3042 names and addresses, a simple random sample of 200 
names showed on investigation 38 wrong addresses. Estimate the total number of 
addresses needing correction in the list and find the standard error of this estimate. We have 


N=3042, n=200, a=38, p=0.19 
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The estimated total number of wrong addresses is 
A = Np = (3042)(0.19) = 578 


Sa =V[(3042)(2842)(0.19)(0.81)/199] = /6686 = 81.8 


Since the sampling ratio is under 7%, the fpc makes little difference. To remove it, replace 
the term N-n by N. If, in addition, we replace n — 1 by n, we have the simpler formula 


Sno = Nvpq/n = (3042)V(0.19)(0.81)/200 = 84.4 
This is in fairly close agreement with the previous result, 81.8. 


The preceding formulas for the variance and the estimated variance of p hold 
only if the units are classified into C or C’ so that p is the ratio of the number of 
units in C in the sample to the total number of units in the sample. In many surveys 
each unit is composed of a group of elements, and it is the elements that are 
classified. A few examples are as follows: 


Sampling Unit Elements 

Family Members of the family 
Restaurant Employees 

Crate of eggs Individual eggs 

Peach tree Individual peaches 


If a simple random sample of units is drawn in order io estimate the proportion P 
of elements in the population that belong to class C, the preceding formulas do not 
apply. Appropriate methods are given in section 3.12. 


3.3 THE EFFECT OF P ON THE STANDARD ERRORS 


Equation (3.6) shows how the variance of the estimated percentage changes 
with P, for fixed n and N. If the fpc is ignored, we have 


vip) =f2 
n 


The function PQ and its square root are shown in Table 3.1. These functions 
may be regarded as the variance and standard deviation, respectively, for a sample 
of size 1. 

The functions have their greatest values when the population is equally divided 
between the two classes, and are symmetrical about this point. The standard error 
of p changes relatively little when P lies anywhere between 30 and 70%. At the 


maximum value of V PQ, 50, a sample size of 100 is needed to reduce the standard 
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TABLE 3.1 
VALUES OF PQ AND VPO 
P = Population percentage in class C 
P| .05- S10 20 30 40 50 60 70 80 90 100 


PQ |O 900 1600 2100 2400 2500 2400 2100 1600 900 0 
VPQ|0 30 40 46 49 50 49 46 40 30 0 


error of the estimate to 5%. To attain a 1% standard error requires a sample size 
of 2500. 

This approach is not appropriate when interest lies in the total number of units 
in the population that are in class C. In this event it is more natural to ask; Is the 
estimate likely to be correct to within, say, 7% of the true total? Thus we tend to 
think of the standard error expressed as a fraction or percentage of the true value, 


NP. The fraction is 
ana NVPQ |N=n_1 Q N-n 
CAST a IEA E (3.13) 
NP VnNPVN-1 VnVPVN-1 


` This quantity is called the coefficient of variation of the estimate. If the fpc is 
ignored, the coefficient is ¥Q/nP. The ratio V Q/P, which might be considered the 
coefficient of variation for a sample of size 1, is shown in Table 3.2. 


TABLE 3.2 
VALUES OF VOQ|P FOR DIFFERENT VALUES OF P 


P = Population percentage in class C 


P 0 0.1 0.5 1 5 10 20 

VOIP œ 31.6 14.1 9.9 44 3.0 2.0 
P 30 40 50 60 70 80 90 

VOIP 1.5 1.2 1.0 0.8 0.7 0.5 0.3 


For a fixed sample size, the coefficient of variation of the estimated total in class 
C decreases steadily as the true percentage in C increases. The coefficient is high 
when Pis less than 5%. Very large samples are needed for precise estimates of the 
total number possessing any attribute that is rare in the population. For P= 1%, 
we must have Vn = 99 in order to reduce the coefficient of variation of the estimate 
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to 0.1 or 10%. This gives a sample size of 9801. Simple random sampling, or any 
method of sampling that is adapted for general purposes, is an expensive method 
of estimating the total number of units of a scarce type. 


3.4 THE BINOMIAL DISTRIBUTION 


Since the population is of a particularly simple type, in which the y; are either 1 
or 0, we can find the actual frequency distribution of the estimate p and not merely 
its:mean and variance. 

The population contains A units that are in class Cand N—A units in C’, where 
P = A/N. If the first unit that is drawn happens to be in C, there will remain in the 
population A —1 units in C and N—A in C. Thus the proportion of units in C, 
after the first draw, changes slightly to (A —1)/(N-—1). Alternatively, if the first 
unit drawn is in C’, the proportion in C changes to A/(N—1). In sampling without 
replacement, the proportion keeps changing in this way throughout the draw. In 
the present section these variations are ignored, that is, P is assumed constant. 
This amounts to assuming that A and N~A are both large relative to the sample 
size- n, or that sampling is with replacement. 

With this assumption, the process of drawing the sample consists of a series of n 
trials, in each of which the probability that the unit drawn is in C is P. This 
situation gives rise to the familiar binomial frequency distribution for the number 
of units in Cin the sample. The pasa that the sample contains a unitsin Cis 


Pr(a)= Yee Ores (3.14) 


ae = 


From this expression we may tabulate the frequency distribution of a, of 
p=a/n, or of the estimated total Np. 

There are three comprehensive sets of tables, All give P by intervals of 6.01. 
The ranges for n are as follows. 


U.S. Bureau of Standards (1950): 

n=1(1)49, (7.e., goes from 1 to 49 by intervals of 1). 
Romig (1952): n = 50(5)100. 
Harvard Computation Laboratory (1955): 

n =1(1)50(2)100(10)200(20)500(50)1000 


3.5 THE HYPERGEOMETRIC DISTRIBUTION 


The distribution of p can be found without the assumption that the population is 
large in relation to the sample. The numbers of units in the two classes C and C' in 
the population are A and A’, respectively. We will calculate the probability that 
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the corresponding numbers in the sample are a and a’, where 
ata'=n, A+A'=N 


In simple random sampling each of the () different selections of n units out of 


N has an equal chance of being drawn. To find the probability wanted, we count 
how many of these samples contain exactly a units from C and a' from C’. The 


' i (A 
number of different selections of a units among the-A that are in C is ), 


r 


whereas the number of different selections of a’ among A’ is F)! Each selection 


of the first type can be combined with any one of the second to give a different 
sample of the required type. The total number of samples of the required type is 


therefore ANTA 
\ )( E) 


Hence, if a simple random sample of size n is drawn, the probability that it is of 


the required type is 
Pr(a, a'|A, A')= (*) 3 E YA (3.15) 
a a n 


This is the frequency distribution of a or np, from which that of p is immediately 
derivable. The distribution is called the hypergeometric distribution. 
For computing purposes the hypergeometric probability (3.15) may be written 
as follows. 
nlia A(A-1)...(A-a +1)(A')(A'-1) ... (A'—a'+1) (3.16) 
a\(n—a)! N(N-1)...(N—n +1) i 


Example. A family of eight contains three males and five females. Find the frequency 
distribution of the number of males in a simple random sample of size 4, In this case 


A=3; A’=5, N=8; n=4 
From (3.16) the distribution of the number of males, a, is as follows: 
a Probability 


o AL S432 
014! 8.7.6.5 14 
po A 3543 6 
1!3!8.7.6.5 14 
Fee 3.2.5.4 _ 6 
212! 8.7.6.5 14 
3 43.245 1 
311! 8.7.6.5 14 


4 Impossible = 0 
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The reader may verify that the mean number of males is 3 and the variance is }3. These 
results agree with the formulas previously established in section 3.2, which give 


nA _(4)(3)_3 

E P= 

np) mB Figg 3a 7 
N-n_,354_15 

vi = A eee 

(np) NPON EI ER eT T28 


3.6 CONFIDENCE LIMITS 


We first discuss the meaning of confidence limits in the case of qualitative 
characteristics. In the sample, a out of n fallin class C. Suppose that inferences are 
to be made about the number A in the population that fall in class C. For an 
upper confidence limit to A, we compute a value Ay such that for this value the 
probability of getting a or less falling in C in the sample is some small quantity ay, 
for example, 0.025. Formally, Av satisfies the equation 


È Pr(j,n—jlAv, N-Ay)=auU (3.17) 
j=0 


where Pris the probability term for the hypergeometric distribution, as defined in 
(3.15). 

When ay is chosen in advance, (3.17) ) requires in general a nonintegral value of 
Av to satisfy it, whereas conceptually Ay should be a whole number. In practice 
we choose Ay as the smallest integral value of A such that the left side of (3. 17)is 
less than or equal to ay. Simiiarly, the lower confidence limit A, is the largest 
integral value such that 


È Pr(j, n-jlA,, N-A,) Sa, (3.18) 
j=a 
Confidence limits for P are then found by taking Py = Ao/ N, Ê, = Â, /N. 


Numerous methods are available for computing confidence limits. 


Exact Methods 


Chung and DeLury (1950) present charts of the 90, 95, and 99% limits for P for 
N = 500, 2500, and 10,000. Values for intermediate population sizes are obtain- 
able by interpolation. Lieberman and Owen (1961) give tables of individual and 
cumulative terms of the hypergeometric distribution, but N extends only to 100. 


The Normal Approximation 


From (3.8) for the estimated variance of p, one form of the normal approxima- 
tion to the confidence limits for P is 


1 
pafi palin —1)+5-| (3.19) 
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where f=n/N and t is the normal deviate corresponding to the confidence 
probability. Use of the more familiar term Vpq/n seldom makes an appreciable 
difference. The last term on the right is a correction for continuity. This produces 
only a slight improvement in the approximation. However, without the correc- 
tion, the normal approximation usually gives too narrow a confidence interval. 


TABLE 3.3 
SMALLEST VALUES OF mp FOR USE OF THE NORMAL 
APPROXIMATION 
np = Number Observed n= 
P in the Smaller Class Sample Size 
0.5 15 30 
0.4 ; 20 50 
0.3 24 80 
0.2 40 200 
0.1 60 600 
0.05 70 j 1400 
~0* 80 œ 


* This means that p is extremely small, so that np follows 
the Poisson distribution. e 


The error in the normal approximation depends on all the quantities n, p, N, au, 
and @,. The quantity to which the error is most sensitive is np or more specifically 
the number observed in the smaller class. Table 3.3 gives working rules for 
deciding when the normal approximation (3.19) may be used. ` 

The rules in Table 3.3 are constructed so that with 95% confidence limits the 
true frequency with which the limits fail to enclose P is not greater than 5.5%. 
Furthermore, the probability that the upper limit is below P is between 2.5 and 
3.5%, and the probability that the lower limit exceeds P is between 2.5 and 1.5%. 


Example1. Inasimple random sample of size 100, from a population of size 500, there 
are 37 units in class C. Find the 95% confidence limits for the proportion and for the total 
number in class C in the population. In this example 


n=100, N=S500, p=0.37 


The example lies in the range in which the normal approximation is recommended. The 
estimated standard error of p is 


V(—f)pq/(n — 1) = ¥(0.8)(0.37)(0.63)/99 = 0.0434 


The correction for continuity, 1/2n, equals 0.005. Hence the 95% limits for P are 
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estimated as 
0.37 = (1.96 x 0.0434 +0.005) = 0.37 + 0.090 


Ê, =0.280, Ê, =0.460 


The limits as read from the charts by Chung and DeLury are 0.285 and 0.462, respectively. 
To find limits for the total number in class C in the population, we multiply by N, 
obtaining 140 and 230, respectively. 


Binomial Approximations $ 


When the normal approximation does not apply, limits for P may be found from 
the binomial tables (section 3.4) and adjusted, if necessary, to take account of the 
fpc. Table VIII in Fisher and Yates’ Statistical Tables (1957) gives binomial 
confidence limits for P for any value of n, and is a useful alternative to the ordinary 
binomial tables. Example 2 shows how the binomial approximation is computed. 


Example 2. For another item in the sample in example 1, nine of the 100 units fall in 
class C. From Romig’s table for n = 100 the 95% limits for P are found to be 0.041 and 
0.165. (The Fisher-Yates tables give 0.042 and 0.164.) If f, the sampling fraction, is less 
than 5%, limits found in this way are close enough for most purposes. In this example, 
f=0.2 and adjustment is needed. So) 

To apply the adjustment, we shorten the interval between p and each limit by the factor 
vi-f= HE =0.894. The adjusted limits are as follows: 


P= 0.090 — (0.894)(0.090— 0.041) = 0.046 
Êu = 0.090 + (0.894)(0.165 — 0.090) = 0.157 


The limits read from the charts by Chung and DeLury are 0.045 and 0.157, respectively. 

Burstein (1975) has produced a variant of this calculation that is slightly more accurate. 
Suppose that a units out of n are in class Ç (in this example, a = 9; n = 100). In P,, replace 
a/n =0.090 by (a —0.5)/n =0.085. In Py, replace a/n by (a+a/n)/n =0.0909, Also, 
(1—f) is taken as (V—n)/(N—1). Thus, by Burstein’s method, 


Ê, = 0.085 —(0.895)(0.085 —0.041) = 0.046 
Py = 0.0909 + (0.895)(0.165—0.0909) = 0.157 


there being-no change in the limits in this example. 


Example 3. In auditing records in which a very low error rate is demanded, the upper 
confidence limit for A is primarily of interest. Suppose that 200 of 1000 records are verified 
and that the batch of 1000 is accepted if no errors are found. Special tables have been 
constructed to give the upper confidence limit for the number of errors in the batch. A good 
approximation results from the following relation. The probability that no errors are found 
in n when A errors are present in N is, from the hypergeometric distribution, 


(N-A)(N-A~1)...(N=Annt)) . (A) 
N(N-1)...(N=n+1) N-u 
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where u =(n—1)/2. For example, with n =200, A =10, N= 1000, the approximation 
gives (890.5/900.5)°°°, which is found by logs to be 0.107. Thus A = 10 (a 1% error rate) is 
approximately the 90% upper confidence limit for the number of errors in the batch. 


3.7 CLASSIFICATION INTO MORE THAN TWO CLASSES 


Frequently, in the presentation of results, the units are classified into more than 
two classes. Thus a sample from a human population may be arranged in 15 
five-year age groups. Even when a question is supposed to be answered by a 
simple “yes” or “no,” the results actually obtained may fall into four classes: 
“yes,” “no,” “don’t know,” and “no answer.” The extension of the theory to such 
cases is illustrated by the situation in which there are three classes. 

We suppose that the number falling in the ith class is A; in the population and a; 
in the sample, where 

N=} A, n=} a, PÑ KES ii 

When the sample size n is small in relation to all the A; the probabilities P; may 
be considered effectively constant throughout the drawing of the sample. The 
probability of drawing the observed sample is given by the multinomial expression 


n! a, p_a p_a, 
Pia TE 1P32P, (3.20) 


a2 


This is the appropriate extension of the binomial distribution and is a good 
approximation when the sampling fraction is small. 


The correct expression for the probability of drawing the observed sample is 


pr alad (GGG) Gan 


This expression is the natural extension of (3.15), section 3.5, for the 
hypergeometric distribution. The numerator is the number of distinct samples of 
size n that can be formed with a, units in class 1, a2 in class 2, and a; in class 3. 


3.8 CONFIDENCE LIMITS WITH MORE THAN TWO CLASSES 


Two different cases must be distinguished. 
Case 1. We calculate 


_ number in any one class in sample _ a, 


n n 
or 


_ total number in a group of classes_a,+a,+a3 


n n 


| 
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In either of these situations, although the original classification contains more 
than two classes, p itself is obtained from a subdivision of the n units into only two 
classes. The theory already presented applies to this case. Confidence limits are 
calculated as described in section 3.6. 

Case 2. Sometimes certain classes are omitted, p being computed from a 
breakdown of the remaining classes into two parts. For example, we might omit 
persons who did not know or gave no answer and consider the ratio of number 
of“yes” answers to “yes” plus “no” answers. Ratios that are structurally of this 
type are often of interest in sample surveys. The denominator of such a ratio is not 
n but some smaller number n’. 

Although n' varies from sample to sample, previous results can still be used by 
considering the conditional distribution of p in samples in which both n and n’ are 
fixed. This device was already employed in section 2.12. Suppose that 


a 


Ss, n'=a,+42, n=a,+a2,+a,3 
aitaz 


so that a; is the number in the sample falling in classes in which we are not at the 
moment interested. Then, as shown in the next section, the conditional distribu- 
tion of a; and a; is the hypergeometric distribution obtained when the sample is of 
size n' and the population of size N' = A, +A>. Hence, from (3.19), the normal 
approximations to conditional confidence limits for P = A,/(A,+A2) are 


If the value of N’ is not known, n/N may be substituted for n'/N’ in the fpc term 
in (3.22). 


3.9 THE CONDITIONAL DISTRIBUTION OF p 


To find this distribution, we restrict our attention to samples of size n in which 
n' =, +> fall in classes 1 and 2. The number of distinct samples of this type is 


N" =N" À 
lem E E 623) 
n n-n a,+a,/\a, 
Among these samples, the number that have a, in class 1 and a, in class 2 has 


already been given as the numerator in (3.21), section 3.7. Dividing this 
numerator by (3.23), we have 


Fee el ea Se 


This is an ordinary hypergeometric distribution for a sample of size n’ from a 
population of size N' = A, + A2. 
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Example. Consider a population that consists of the five units, b, c, d, e, f, that fall in 
three classes. 


Class A; Units Denoted By 
1 1 b 
2 2 c,d 
3 2 ef 


With random samples of size 3, we wish to estimate P = A,/(A,+ Az) or, in this case, 3. 
Thus N =5 and N’=3. 

There are 10 possible samples of size 3, all with equal initial probabilities. These are 
grouped according to the value of n’. 


n’=1 
Conditional 
Sample a, a p Probability (p—P) 
bef 1 0 1 3 3 
cefordef 0 1 0 3 aa 


If samples are specified by the values of a,, a», only two types are obtainable: a,=1, 
a, =0; a; =0, a,=1. Their conditional probabilities, į and 3, respectively, agree with the 
general expression (3.24), Furthermore, 3 


“f= 9666-575 


The estimate p is unbiased, and its variance agrees with the general formula 


a n) GGG) 


For n' =2 there are six possible samples, which give only two sets of values of a}, az. 
n'=2 
Conditional 
Sample a a -P Probability (p—P) 
bce, bef, bde, or bdf af ol 3 2 A 
cde or cdf 0 2 0 1 2 


The estimate is again unbiased and its variance is 


v- GO- 
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which may be verified from the general formula. Note that the variance is only one fourth of 
that obtained when n'=1. In a conditional approach the variance changes with the 
configuration of the sample that was drawn. 

For n'=3, there is only one possible sample, bcd. This gives the correct population 
fraction, }. The conditional variance of p iszero, as indicated by the general formula, which 
reduces to zero when N'= n’, 


3.10 PROPORTIONS AND TOTALS OVER SUBPOPULATIONS 


If separate estimates are to be made for each of a number of subpopulations or 
domains of study to which the units in the sample are allotted, the results in 
sections 3.8 and 3.9 are applicable. The sample data may be presented as follows. 


Domain1 Domain2 ... Domain k Total 
Class (ee N A e eet Ce iG 
Number of units U OAY ~ a2 a,’ sip. a PG n 


Of the n units, (a; +a;') are found to fall in domain 1 and of these @; fallin class 
C. The proportion falling in class Cin domain 1 is estimated by p:=a;/(a,+a;’). 
The frequency distribution and confidence limits for p; were discussed under Case 
2 in sections 3.8 and 3.9. 


For estimating the total number A, of units in class Cin domain 1, there are two 
possibilities. If Nj, the total number of units in domain 1 in the population, is 
known, we may use the conditional estimate 


(3.25) 


Its standard error is computed as 


S(ÂD =N V1- (m/N W pig) (3.26) 


where n;=a,+a,'. 
If N; is not known, the estimate is 


A aed 
Ay ; (3.27) 
with estimated standard error 
8(A1)=NVi=(n/NWpa/n 1) (3.28) 


where p =a,/n. 
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3.11 COMPARISONS BETWEEN DIFFERENT DOMAINS 


Since proportions are estimated independently in different domains, compari- 
sons between such proportions are made by standard elementary methods. For 
example, to test whether the proportion p;=a,/(a;+a,’) differs significantly 
from the proportion pz = a2/(a2+a2'), we form the usual 2 x2 table. 


Domain 


Total |n; n' 


The ordinary x* test (Fisher, 1958) or the normal approximation to the 
distribution of (pı — p2) is appropriate. Similarly, comparisons among proportions 
for more than two domains are made by the methods for a 2 x k contingency table. 

Occasionally it is desired to test whether a, differs significantly from a3; for 
example, whether the number of Republicans who favor some proposal is greater 
than the number of Democrats in favor. On the null hypothesis that these two 
numbers are equal in the population, the total n’ = aı +a, in the two classes in 
question should divide with’ equal probability between the two classes, Conse- 
quently we may regard a, as a binomial number of successes in n’ trials, with 


probability of success 3 on the null hypothesis. It may be verified that the normal 
deviate (corrected for continuity) is 


2(lai —3n'|—4) 


Vn 


3.12 ESTIMATION OF PROPORTIONS IN CLUSTER 
SAMPLING 


As mentioned in section 3.2, the preceding methods are not valid if each unitis.a 
cluster of elements and we are estimating the proportion of elements that fall into 
class C. 

If each unit contains the same number m of elements, let Pi 
proportion of elements in the ith unit that fall into class C, The p; 
in C in the sample is 


=a;/m be the 
roportion falling 


Ya, la 


“pm nP 


that is, the estimate p is the uńweighted mean of the quantities p;. Consequently, if 
y; is replaced by p;, the formulas in Chapter 2 may be applied directly to give the 
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true and estimated variance of p. 


—_— S — 2 
vo- 120 I 29) 


An unbiased sample estimate of this variance is 


1-f Y@-p) 
v(p) A Er (3.30) 
Example 1. A group of 61 leprosy patients were treated with a drug for 48 weeks. To 
measure the effect of the drug on the leprosy bacilli, the presence of bacilli at six sites on the 
body of each patient was tested bacteriologically. Among the 366 sites, 153, or 41.8%, 
were negative. What is the standard error of this percentage? 
This example comes from a controlled experiment rather than a survey, but it illustrates 
how erroneous the binomial formula may be. By the binomial formula, we have n =366, 


and 
s.e. (p) =Vpq/(n — 1) =V (41.8)(58.2)/365 = 2.58% 
Each patient is a cluster unit with m = 6 elements (sites). To find the standard error by the 
correct formula, we need the frequency distribution of the 61 values of p,. It is more 


convenient to tabulate the distribution of y,, the number of negative sites per patient. With 
pi expressed in percents, p; = 100y,/6. From the distribution in Table 3.4 we find Lf = 


669 and 
AZA _ 669-11530761] _ 
s.e. (9) = TAE 160) = 2-279 


s.e. (p) = se, (¥)=4.65% 


Hence 


This figure is about 1.8 times the value given by the binomial formula. The binomial 
formula requires the assumption that results at different sites on the same patient are 
independent, although actually they have a strong positive correlation. The last line of 
Table 3.4 shows the expected number of patients with 0, 1, 2, . . . negative sites, computed 
from the binomial (0.58+0.42)°. Note the marked excesses of observed frequencies f of 
patients with zero negatives and with five and six negatives. 


TABLE 3.4 
NUMBER OF NEGATIVE SITES PER PATIENT 
Yi = 6p;/100 0 1 2 3 4 5 6 Total 
fi 17 11 4 4 7 14 4 6l 
fui 0 1] 8 12 28 70 24 153 


fee 2.3 10.1 18.3 17.6 96 28 0.3 61.0 
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If the size of cluster is not constant, let m; be the number of elements in the ith 
cluster unit and let p; = a;/m;. The proportion of units falling in class C in the 
sample is 


Ms 
& 


p= (3.31) 
Èm 

Structurally, this is a typical ratio estimate, discussed in section 2.11 and later in 

Chapter 6. It is slightly biased, although the bias is seldom likely to be of practical 

importance. 


If we put a; for y; and m; for x; in (2.39), the approximate variance of p is 


a —Pm;)* 
vosa 5 T - 


(3.32) 


7 >, UN 
where P is the proportion of elements in C in the population and M =F m,/N is 
the average number of elements per cluster. An alternative expression is 


— P — 2 
oo S 


This form shows that the approximate variance involves a weighted sum of 
squares of deviations of the p; from the population value P, 
For the estimated variance we have 


1—f ¥ a?—2p Y am, +p?¥ m? 
= gu ' 3.34 
v(p) HR? nET ¢ ) 


where 7 =} m;/n is the average number of elements per cluster in the sample. 


Example 2. A simple random sample of 30 households was drawn froma census taken 
in 1947 in wards 6 and 7 of the Eastern Health District of Baltimore. The population 
contains about 15,000 households. In Table 3.5 the persons in each household are classified 
(a) according to whether they had consulted a doctor in the last 12 months, (b) according 
to sex. 

Our purpose is to contrast the ratio formula with the inappropriate binomial formula. 


Consider first the proportion of people who had consulted a doctor. For the binomial 
formula, we would take 


30 
=104, p = Tog 7 02885 


Hence 


pa _ (0.2885)(0.7115)__ 


Ysin(P) = 1047 = 0.00197 
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TABLE 3.5 


DATA FOR A SIMPLE RANDOM SAMPLE OF 30 HOUSEHOLDS 


Doctor Seen in 
Number of Last Year 
Number of 
Household Persons Males Females Yes No 
Number a; a; 


WCOIDANUARWNeK 


a 
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For the ratio formula, we note that there are 30 clusters and take 
n=30 
m; = total number in ith household 
a, = number in ith household who had seen a doctor 
p =0.2885, as before 


104 
=— = 3.4667 
m 30 


¥ a? =86; } m? =404; Sam, =113 - 
The fpc may be ignored. Hence, from (3.34), 


(86) =2(0.2885)(113)+(0.2885)"(404) 
v(p)= (30)(29)(3.4667)" 


The variance given by the ratio method, 0.00520, is much larger than that given by the 
binomial formula, 0.00197. For various reasons, families differ in the frequency with which 
their members consult a doctor. For the sample as a whole, the proportion who consult a 
doctor is only a little more than one in four, but there are several families in which every 
member has seen a doctor. Similar results would be obtained for any characteristic in which 
the members of the same family tend to act in the same way. 


In estimating the proportion of males in the population, the results are different. By the 
same type of calculation, we find 


= 0.00520 


binomial formula: v(p) =0.00240 
ratio formula v(p)=0.00114 


Here the binomial formula overestimates the variance. The reason is interesting. Most 
households are set up as a result of a marriage, hence contain at least one male and one 
female. Consequently the proportion of males per family varies less from one half than 
would be expected from the binomial! formula. None of the 30 families, except one with 
only one member, is composed entirely of males, or entirely of females. If the binomial 
distribution were applicable, with a true P of approximately one half, households with all 
members of the same sex would constitute one quarter of the households of size 3 and one 
eighth of the households of size 4. This property of the sex ratio has been discussed by 
Hansen and Hurwitz (1942). Other illustrations of the error committed by improper use of 
the binomial formula in sociological investigations have been given by Kish (1957) 


EXERCISES 


3.1 For a population with N=6, A =4, A' =2, work out the value of a for all possible 
simple random samples of size 3. Verify the theorems given for the mean and variance of 
p=a/n. Verify that 

N-n 
(n=1)N 
is an unbiased estimate of the variance of p. 


, 3.2. Inasimple random sample of 200 from a population of 2000 colleges, 120 colleges 
were in favor of a proposal, 57 were opposed, and 23 had no opinion. Estimate 95% 


Pq 
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confidence limits for the number of colleges in the population that favored the 
proposal. 

3.3 Do the results of the previous sample furnish conclusive evidence that the majority 
of the colleges in the population favored this proposal? 

3.4 A population with N =7 consists of the elements B,, C1, C2, C3, D,, D2, and D3. A 
simple random sample of size 4 is taken in order to estimate the proportion of C’s to 
C’s+D’s. Work out the conditional distributions of this proportion, p, and verify the 
formula for its conditional variance. 

3.5 In the preceding exercise, what is the probability that a sample of size 4 contains 
B,? Find the average variance of p in exercise 3.4 over all simple random samples of size 4. 
This is 0.0393 as against 0.025 with N = 6, n = 4, and B, absent. X 

3.6 A simple random sample of 290 households was chosen from a city area containing 
14,828 households. Each family was asked whether it owned or rented the house and also 
whether it had the exclusive use of an indoor toilet. Results were as follows, 


Owned Rented Total 


: Exclusive use of toilet Yes No Yes No 
141 6 109 34 290 


(a) For families who rent, estimate the percentage in the area with exclusive use of an 
indoor toilet and give the standard error of your estimate; (b) estimate the total number of 
renting families in the area who do not have exclusive indoor toilet facilities and give the 
standard error of this estimate. 

3.7 If,inexample 3.6, the total number of renting families in the city area is 7526, make 
a new estimate of the number of renters without exclusive toilet facilities and give the 
standard error of this estimate. 

3.8 For estimating the total number of units in class C in domain 1 (section 3.10), the 
estimate A, = Nip; was recommended if N, were known, as against A,'=Na,/n if N, were 
not known. Ignoring the fpc, show that in large samples the ratio of the variance of A, to 
that of Ad is approximately Q,/(Q,+P;7), where 7 is the proportion of the population 
that is not in domain 1, and P,, as in section 3.10, is the proportion of the units in domain 1 
that fall in class C. State the conditions under which knowledge of N, produces large 
reductions in variance. 

3.9 Ina simple random sample of size 5 from a population of size 30, no units in the 
sample were in class C. By the hypergeometric distribution, find the upper limit to the 
number A of units in class C in the population, corresponding to a one-tailed confidence 
probability of 95%. Find also the approximation to Ay obtained by computing the upper 
95% binomial limit Py and shortening the interval as described in section 3.6. Try also the 
method on p. 59, Example 3. 

3.10 A student health service has a record of the total number of eligible students N 
and of the total number of visits Y made by students during a year! Some students made no 
visits. The service wishes to estimate the mean number of visits Y/N, for the N; students 
who made at least one visit, but does not know the value of Ny. A simple random sample of 
n eligible students is taken. In it n, students out of the n made at least one visit and their 
total number of visits was y. Ignore the fpc in this question. (a) Show that y/n, is an 
unbiased estimate of Y/N, and that its conditional variance is S*/n,, where $° is the 
variance of the number of visits among students making at least one visit. (b) A second 
method of estimating Y/N; is to use Ñ, = Nn,/n as an estimate of N, and hence Yn/Nn, as 
an estimate of Y/N,. Show that this estimate is biased and that the ratio of the bias to the 
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true value Y/N, is approximately (N —N,)/nN,. Find an approximate expression for the 
variance of the estimate Y,,/Nn, and show that the estimate in (a) has a higher variance if 


(N-Ni)ni x) 
ESNEA ae TL (SS 
Sees Nin a 


Hint. If p is a binomial estimate of P, based on n trials, then approximately 
Nye LO. 1 Q 
cbr. -g 
p! P nP” p! nP 


3.11 Which of the two previous estimates seems more precise in the following 
circumstances? N =2004, Y=3011. The sample with n = 100 showed that 73 students 
made at least one visit. Their total number of visits was 152 and the estimated variance s? 
was 1,55. 

3.12 Asimple random sample of n cluster units, each with m elements, is taken from a 
population in which the proportion of elements in class C is P. As the intracluster 
correlation varies, what are the highest and lowest possible values of the true variance of p 
(the sample estimate of P) and how do they compare with the binomial variance? Ignore the 
fpe. 

3.13 For the sample of 30 households in Table 3.5, the data shown below refer to visits 
to the dentist in the last year. Estimate the variance of the proportion of persons who saw a 
dentist, and compare this with the binomial estimate of the variance. 

3.14 In sampling for a rare attribute, one method is to continue drawing a simple 
random sample until m units that possess the rare attribute have been found (Haldane, 

1945) where m is chosen in advance. If the fpc is ignored, prove that the probability that the 


Number Dentist Seen Number Dentist Seen 
of — of 


Persons Yes No Persons Yes No 
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total sample required is of size n is 


(n-1)! aero 
Gia Sees vem) 
where P is the frequency of the rare attribute. Find the average size of the total sample and 
show that if m > 1, p =(m—1)/(n—1)isan unbiased estimate of P. (For further discussion, 
see Finney, 1949, and Sandelius, 1951, who considers a plan in which sampling continues 
until either m have been found or the total sample size has reached a preassigned limit no.) 
See also section 4.5. = 


CHAPTER 4 


The Estimation of Sample Size 


4.1 A HYPOTHETICAL EXAMPLE 


In the planning of a sample survey, a stage is always reached at which a decision 
must be made about the size of the sample. The decision is important, Too large a 
sample implies a waste of resources, and too small a sample diminishes the utility 
of the results. The décision cannot always be made satisfactorily; often we do not 
possess enough information to be sure that our choice of sample size is the best 
one. Sampling theory provides a framework within which to think intelligently 
about the problem. 

A hypothetical example brings out the steps involved in reaching a solution. An 
anthropologist is preparing to study the inhabitants of some island. Among other 
things, he wishes to estimate the percentage of inhabitants belonging to blood 
group O. Cooperation has been secured so that it is feasible to take a simple 
random sample. How large should the sample be? 

This equation cannot be discussed without first receiving an answer to another 
question. How accurately does the anthropologist wish to know the percentage of 
people with blood group O?-In reply he states that he will be content if the 
percentage is correct within +5% in the sense that, if the sample shows 43% to 
have blood group O, the percentage for the whole island is sure to lie between 38 
and 48, 

To avoid misunderstanding, it may be advisable to point out to the anthropolo- 
gist that we cannot absolutely guarantee accuracy within 5% except by measuring 
everyone. However large n is taken, there is a chance of a very unlucky sample 
that isin error by more than the desired 5%. The anthropologist replies coldly that 
he is aware of this, that he is willing to take a 1 in 20 chance of getting an unlucky 
sample, and that all he asks for is the value of n instead of a lecture on statistics. 

Weare nowi: a position to make a rough estimate of n. To simplify matters, the 
fpc is ignored, ; nd the sample percentage p is assumed to be normally distributed. 
Whether these assumptions are reasonable can be verified when the initial n is 


known. ; 
In technical terms, p is to lie in the range (P +5), except for a 1 in 20 chance. 


Since p is assumed normally distributed about P, it will lie in the range (P+20,), 
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apart from a 1 in 20 chance. Furthermore, 
op =VPQ/n 


Hence, we may put 
2VPQ/n=5 or n= 


At this point a difficulty appears that is common to all problems in the 
estimation of sample size. A formula for n has been obtained, but n depends on 
some property of the population that is to be sampled. In this instance the 
property is the quantity P that we would like to measure. We therefore ask the 
anthropologist if he can give us some idea of the likely value of P. He replies that 
from previous data on other ethnic groups, and from his speculations about the 
racial history of this island, he will be surprised if P lies outside the range 30 to 
60%. 


This information is sufficient to provide a usable answer. For any value of P ° 


between 30 and 60, the product PQ lies between 2100 and a maximum of 2500 at * 


P=50. The corresponding n lies between 336 and 400. To be on the safe side, 400 - 


is taken as the initial estimate of n. 

The assumpticns made in this analysis can now be reexamined. With n = 400 
and a P between 30 ard 60, the distribution of p should be close to normal. 
Whether the fpc is required depends on the number of people on the island, If the 
population exceeds 8000, the sampling fraction is less than 5% and no adjustment 
for fpc is called for. The method of applying the readjustment, if it is needed, is 
discussed in section 4.4. 


4.2 ANALYSIS OF THE PROBLEM 


The principal steps involved in the choice of a sample size are as follows. 


1. There must be some statement concerning what is expected of the sample. 
This statement may be in terms of desired limits of error, as in the previous 
example, or in terms of some decision that is to be made or action that is to be 
taken when the sample results are known. The responsibility for framing the 
statement rests primarily with the persons who wish to use the results of the 
survey, although they frequently need guidance in putting their wishes into 
numerical terms. 

2. Some equation that connects n with the desired precision of the sample must 
be found. The equation will vary with the content of the statement of precision and 
with the kind of sampling that is contemplated. One of the advantages of 
probability sampling is that it enables this equation to be constructed. 

3. This equation will contain, as parameters, certain unknown properties of the 
population. These must be estimated in order to give specific results. 
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4. It often happens that data are to be published for certain major subdivisions 
of the population and that desired limits of error are set up for each subdivision. A 
separate calculation is made for the n in each subdivision, and the total n is found 
by addition. 

5. More than one item or characteristic is usually measured in a sample survey: 
sometimes the number of items is large. If a desired degree of precision is 
prescribed for each item, the calculations lead to a series of conflicting values of n, 
one for each item. Some method must be found for reconciling these values. 

6. Finally, the chosen value of n must be appraised to see whether it is 
consistent with the resources available to take the sample. This demands an 
estimation of the cost, labor, time, and materials required to obtain the proposed 
size of sample. It sometimes becomes apparent that n will have to be drastically 
reduced. A hard decision must then be faced—whether to proceed with a much 
smaller sample size, thus reducing precision, or to abandon efforts until more 
resources can be found. 


In succeeding sections some of these questions are examined in more detail. 


4.3 THE SPECIFICATION OF PRECISION 


The statement of precision desired may be made by giving the amount of error 
that we are willing to tolerate in the sample estimates, This amount is determined, 
as best we can, in the light of the uses to which the sample results are to be put. 
Sometimes it is difficult to decide how much error should be tolerated, particularly 
when the results have several different uses. Suppose that we asked the 
anthropologist why he wished the percentage with blood group O to be correct to 
5% instead of, say, 4 or 6%. He might reply that the blood group data are to be 
used primarily for racial classification. He strongly suspects that the islanders 
belong either to a racial type with a P of about 35% or to one with a P of about 
50%. Anerror limit of 5% in the estimate seemed to him small enough to permit 
classification into one of these types. He would, however, have no violent 
objection to 4 or 6% limits of error. : 

Thus the choice of a 5%limit of error by the anthropologist was to some extent 
arbitrary. In this respect the example is typical of the way in which a limit of error 
is often decided on. In fact, the anthropologist was more certain of what he wanted 
than many other scientists and administrators will be found to be. When the 
question of desired degree of precision is first raised, such persons may confess 
that they have never thought about it and have no idea of the answer. My 
owever, that after discussion they can frequently indicate at 
least roughly the size of a limit of error that appears reasonable to them, 

Further than this we may not be able to go in many practical situations, Part of 
the difficulty is that not enough is known about the consequences of errors of 
different sizes as they affect the wisdom of practical decisions that are made from 


‘experience has been, h 
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survey results. Even when these consequences are known, however, the results of 
many important surveys are used by different people for different purposes, and 
some of the purposes are not foreseen at the time when the survey is planned. 
Therefore, an element of guesswork is likely to be prominent in the specification 
of precision for some time to come. 

If the sample is taken for a very specific purpose, (e.g., for making a single “yes” 
or “no” decision or for deciding how much money to spend on a certain venture), 
the precision needed can usually be stated in a more definite manner, in terms of 
the consequences of errors in the decision. A general approach to problems of this 
type is given in section 4.10, which, although in need of amplification, offers a 
logical start on a solution. 


4.4 THE FORMULA FOR n IN SAMPLING FOR PROPORTIONS 


The units are classified into two classes, C and C’. Some margin of error d in the 
estimated proportion p of units in class C has been agreed on, and there is a small 
risk æ that we are willing to incur that the actual error is larger than d; that is, we 


want 
Pr (|p—P|=d)=a 


Simple random sampling is assumed, and p is taken as normally distributed. 
From theorem 3.2, section 3.2, 


EAEN ER 
RENNIN 
Hence the formula that connects n with the desired degree of precision is 
ral [N -=n |PQ 
= YN-1V n 


where f is the abscissa of the normal curve that cuts off an area of a at the tails. 
Solving for n, we find 


PPQ 
d? 
Teo ) 
SEN ( adie 
For practical use, an advance estimate p of P is substituted in this formula. If N is 
large, a first approximation is 


(4.1) 


(4.2) 
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where 
V= = = desired variance of the sample proportion 
‘0 


In practice we first calculate no. If no/N is negligible, no is a satisfactory 
approximation to the n of (4.1). If not, it is apparent on comparison of (4.1) and 
(4.2) that n is obtained as 


aS - No a no 
1+(no—1)/N 1+(no/N) 
Example. In the hypothetical blood groups example we had 
d=0.05, p=0.5, a=0.05, t=2 


(4.3) 


Thus 
_ (4)(0.5)(0.5) 
° (0.0025) 


Let us assume that there are only 3200 people on the island. The fpc is needed, and we 
find : 


=400 


ae No _ 400 
1+(no-1)/N 1+% 


The formula for no holds also if d, p, and q are all expressed as percentages instead of 
proportions. Since the product pq increases as p moves toward }, or 50%, a conservative 
estimate of n is obtained by choosing for p the value nearest to} in the range in which p is 


; thought likely to lie. If p seems likely to lie between 5 and 9%, for instance, we assume 9% 
for the estimation of n. 


=356 


Sometimes, particularly when estimating the totai number NP of units in class 
C, we wish to control the relative error r instead of the absolute error in Np; for 
example, we may wish to estimate NP with an error not exceeding 10%. That is, 
we want 


Np — NP| 
Pr (eM) =Pr(|p—P|=rP)=a 


For this specification, we substitute rP or rp for d in formulas (4. 1) and (4.2). From 
(4.2) we get 


q 
h= a= 4.2) 
22 2D ( 


Formula (4.3) is unchanged. 


4.5 RARE ITEMS—INVERSE SAMPLING 


In estimating n from formulas (4.1), (4.2) and (4.2)', the sampler inserts his best 
advance estimate of the population proportion P. If P is known to be between 30 
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and 70%, as in the example in Section 4.1, accurate estimation of P is not crucial. 
But with a rare item (e.g., P< 10%), the necessary n for a specified relative error r 
is 11 times as large when P = 1% as when P= 10%. In this situation (P small but 
not well known in advance), Haldane’s (1945) method of continuing sampling 
until m of the rare items have been found in the sample has one important 
advantage. The method is usually called inverse sampling. 

If n is the sample size at which the mth rare item appears, (m > 1), an unbiased 
estimate of P is p = (m—1)/(n — 1). For N very large, P small, and m = 10, a good 
approximation to V(p) may be shown to be mP*Q/(m—1)*. Hence, 
cv(p) = (mQ)"?/(m—1)<Vm/(m —1), which will be a close upper limit if P is 
small. Thus, by fixing m in advance, we can control the value of cv(p) without 
advance knowledge of P. The value m =27 gives cv(p)<20%, but m = 102 is 
needed for cv(p) < 10%. The value of n with this method is a random variable, but 
will be large if P is small. 


4.6 THE FORMULA FOR n WITH CONTINUOUS DATA 


Most commonly, we wish to control the relative error r in the estimated 
population total or mean. Witha simple random sample having mean y, we want 


pe (Pp l=”)? (Pap = 


where æ is a small probability. We assume that y is normally distributed: from 
theorem 2.2, corollary 1, its standard error is 


r)= Pr (\y-Yl=rY)=a 


Hence 


=- N-n S (4.4) 


Solving for n gives 


n= (S) /[+nG@)] 
VÝ. N\Y. 
Note that the population characteristic on which n depends is its coefficient of 


variation $/Ÿ. This is often more stable and easier to guess in advance than S 


itself. è 
As a first approximation we take 


n= SY 28) es 


78 SAMPLING TECHNIQUES 


substituting an advance estimate of (S/Y). The quantity C is the desired (cv)? of 
the sample estimate. 
If no/N is appreciable we compute n as in (4.3) 


Boh 4.3 
"T+ (no/N) MA 


If instead of the relative error r we wish to control the absolute error d in y, we 
take no=1?S*/d* =S7/V, where V is the desired variance of Y. 


Example, In nurseries that produce young trees for sale it is advisable to estimate, in 
late winter or early spring, how many healthy young trees are likely to be on hand, since this 
determines policy toward the solicitation and acceptance of orders. A study of sampling 
methods for the estimation of the total numbers of seedlings was undertaken by Johnson 
(1943). The data that follow were obtained from a bed of silver maple seedlings 1 ft wide 
and 430 ft long. The sampling unit was 1 ft of the length of the bed, so that N= 430. By 
complete enumeration of the bed if was found that Y = 19, S? = 85.6, these being the true 
population values. i ' TINE 

With simple random sampling, how many units must be taken to estimate Y within 10%, 
apart from a chance of 1 in 20? From (4.5) we obtain 


_ 2S? (858o 
mPy (19 


Sinoe no/N is not negligible, we take 


95 
Tea ae 


n 


Almost 20% of the bed has to be counted in order to attain the precision desired. 
The formulas for n given here apply only to simple random sampling in which 
` the sample mean is used as the estimate of Y. The appropriate formulas for other 
methods of sampling and estimation are presented with the discussion of these 
techniques. 


4.7 ADVANCE ESTIMATES OF POPULATION VARIANCES 


The nursery example is atypical in that the population variance S? was known. 
In practice, there are four ways of estimating population variances for sample size 
determinations: (1) by taking the sample in two steps, the first being a simple 
random sample of size nı from which estimates sı? or pı of S? or P and the 
required will be obtained; (2) by the results of a pilot survey; (3) by previous 
sampling of the same or a similar population; and (4) by guesswork about the 
structure of the population, assisted by some mathematical results. 

Method 1 gives the most reliable estimates of $° or P, but it is not often used, 
since it slows up the completion of the survey. When the method is feasible, Cox 
(1952), following work by Stein (1945), shows how to compute n from Ay,’ or piso 


that the final estimate or p will have a preassigned variance V, a preassigned 
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limit of error d, or a preassigned coefficient of variation. The first sample is 
assumed large enough to neglect terms of order 1/7”. A few results are quoted. 

The results given here assume n; £n, the size of the final sample. When this is 
not so, see Cox (1952). 


Estimation of F with given cv=/C 


The results assume y; normally distributed. If sı is the estimated variance from 
the first sample, take additional units to make the final sample size 


2 2 

Sy ( Sy =) 

=—, (1+8C+—3+— 4.6 
Cy nji mh ce) 


The mean f of the final sample is slightly biased. Take 2 = y(1—2C). 


n 


Estimation of Y with Variance V 
Take additional units to make the total sample size 


n= (1 +=) (4.7) 


If S were known exactly, the required sample size would be S?/V. The effect of 
not knowing S is to increase the average size by the factor (1+2/n;). 
Estimation of P with Variance V 

Let p, be the estimate of P from the first sample. The combined size of the first 
two samples should be 

A -P41 4378p: 41 =3pıqı 


4. 
V Pq Vay E 


The first term on the right is the size required if P is known to be equal to pı. With 
this method, the ordinary binomial estimate p made from the complete sample of 
size n is slightly biased. To correct for bias, take 
Ê =p+- VD) zep) 
Pq 
Estimation of P with given cv =/C 
Take 
qı 3 1 
2 =——+-—_ + (4.9) 
Cpr pıqı Crim 
The estimate is Ê = p—©p/q. In all results given above the fpc is ignored. 


Example. A sampler wishes to estimate P with a coefficient of variation of 0.1(1U%). 
He guesses that P will lie somewhere between 5 and 20%. This range is too wide to give a 
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good initial estimate of the required n. Since the cv of P is VQ/nP, it is easily verified that 

n=400 is adequate for P=20%, but n = 1900 will be needed if P is only 5%. 
Accordingly, he takes an initial sample with nı=396 and finds p;=0.101. Since 
C=0.1, C=0.01. Equation 4.9 gives ` 


(0.899) 3 1 
= + + = 
* (0.01)(0.101) (0.0908) = (0.01)(40) 


926 


The combined sample gives np = 88; p = 88/926 = 0.0950, The correction for bias, Cp/q, 
amounts to 0.0011, giving a final estimate of 0.094 or 9.4%. 


The second method, a small pilot survey, serves many purposes, especially if the 
feasibility of the main Survey is in doubt. If the pilot survey it itself a simple 
random sample, the preceding methods apply. But often the pilot work is 
restricted to a part of the population that is convenient to handle or that will reveal 
the magnitude of certain problems. Allowance must be made for the selective 
nature of the pilot when using its results to estimate $2 or P. For instance, a 
common practice is to confine the pilot work to a few clusters of units, Thus the 
computed s? measures mostly the variation within a cluster and may be an 
underestimate of the relevant S*. The relation between intra- and intercluster 
variation is discussed in Chapter 9. The same problem arises in cluster sampling 
for proportions, in which the formula pq/n may underestimate the effect of 
variation among clusters. Cornfield ( 1951) gives a good illustration of the estima- 
tion of sample size in cluster sampling for proportions, 
use of results from previous surveys—points to the value of 
Ing available, or at least keeping accessible 


deviations in complex Surveys is high, even with electronic machines, and fre- 
at y s.d.’s needed to give a rough idea of the precision of the 
principal estimates are computed and recorded. If suitable past data are found, the 
value of S* may require adjustment for time changes. With skew data in which F is 
changing with time, S 3 is often found to change at a rate lying somewhere between 
ts le ae bbe i isa nae Thus, if ¥ is thought to have increased by 10% 
in the time inte since the previous A . Ome 
estimate of S? by 10 to 20%. P survey, we might increase our initial 
Finally, a serviceable estimate of S? can sometimes be made from relatively 
little information about the nature of the population, In early studies of the 
numbers of wireworms in soils, a tool was used to take a sample (9x9 x 5 in.) of 
the topsoil. For estimating n, the sampler needed to know the standard deviation 
of the number of wireworms found in a boring with the tool. If wireworms 


: were 
distributed at random over the topsoil, the number found in a small volume would 
follow the Poisson distribution, for which $2= Y. Since there might be some 


tendency for wireworms to congregate, it was decided to assume S$? = 1.2Y, the 
factor 1.2 being an arbitrary safety factor. Although Y was not known, the values 


THE ESTIMATION OF SAMPLE SIZE 81 


of Y that are of economic importance with respect to crop damage could be 
delineated. These two pieces of information made it possible to determine sample 
sizes that proved satisfactory. 

Deming (1960) shows how some simple mathematical distributions may be 
used to estimate S from a knowledge of the range and a general idea of the shape 
of the distribution. If the distribution is like a binomial, with a proportion p of the 
observations at one end of the range and a proportion q at the other end, 
S? = pqh*, where h is the range. When p =q =3, the value of S*=0.25h? is the 
maximum possible for a given range h. Other useful relations are that Se= 
0.083h? for a rectangular distribution, $° = 0.0564? for a distribution shaped like 
a right triangle, and S? =0.042h? for an isosceles triangle. 

These relations do not help much if h is large or poorly known. However, if h is 
large, good sampling practice is to stratify the population (Chapter 5) so that 
within any stratum the range is much reduced. Usually the shape also becomes 
simpler (closer to rectangular) within a stratum. Consequently, these relations are 
effective in predicting $, hence n, within individual strata. 


4.8 SAMPLE SIZE WITH MORE THAN ONE ITEM 


In most surveys information is collected on more than one item. One method of 
determining sample size is to specify margins of error for the items that are 
regarded as most vital to the survey. An estimation of the sample size needed is 
first made separately for each of these important items. 

When the single item estimations of n have been completed, it is time to také 
stock of the situation. It may happen that the n’s required are all reasonably close. 
If the largest of the n’s falls within the limits of the budget; this n is selected. More 
commonly, there is a sufficient variation among the n’s so that we are reluctant to 
choose the largest, either from budgetary considerations or because this will give 
an over-all standard of precision substantially higher than originally contem- 
plated. In this event the desired standard of precision may be relaxed for certain of 
the items, in order to permit the use of a smaller value of n. 

In some cases the n’s required for different items are so discordant that certain 
of them must be dropped from the inquiry; with the resources available the 
precision expected for these items is totally inadequate. The difficulty may not be 
merely one of sample size. Some items call for a different type of sampling from 
others. With populations that are sampled repeatedly, it is useful to amass 
information about those items that can be combined economically in a general 
survey and those that necessitate special methods. As an example, a classification 
of items into four types, suggested by experience in regional agricultural surveys, 
is shown in Table 4.1. In this classification, a general survey means one in which 
the units are fairly evenly distributed over some region as, for example, by a 
simple random sample. = 
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TABLE 4.1 


AN EXAMPLE OF DIFFERENT TYPES OF ITEM IN 
REGIONAL SURVEYS 


Type Characteristics of Item Type of Sampling Needed 
1 Widespread throughout the region, occur- A general survey with low 
Ting with reasonable frequency in all sampling ratio. 
parts. 
2 Widespread throughout the region but with A general survey, but with 
low frequency. a higher sampling ratio. 


3 Occurring with reasonable frequency in For best results, a stratified 
most parts of the region, but with more sample with different in- 
sporadic distribution, being absent in  tensities in different parts 
some parts and highly concentrated in of the region (Chapter 5). 
others. Can sometimes be in- 

cluded in a general survey 
with supplementary sam- 


ling. 
4 Distribution very sporadic or concentrated Ei suitable for a gen- 
in a small part of the region. eral survey. Requires a 
sample geared to its dis- 
tribution. 


4.9 SAMPLE SIZE WHEN ESTIMATES ARE WANTED 
FOR SUBDIVISIONS OF THE POPULATION 


It is often planned to present estimates not only for the population as a whole 
but for certain subdivisions. If these can be identified in advance, as with different 
geographical regions, a separate calculation of n is made for each region. Suppose 
that the mean of each subdivision is to be estimated with a specified variance V. 
For the ith subdivision, we have ni =S;/V, so that the total sample size n = 
5 S?/V. The individual S; will, on the average, be smaller than S, the population 
variance, but often they are only slightly smaller. Thus, if there are k subdivisions, 
n =kS?/V, whereas if only the estimate for the population as a whole were 
wanted we would take n = S°/V. 

Thus if estimates with variance V are wanted for each of k subdivisions the 
sample size may approach k times the n needed for an over-all estimate of the 
same precision. This point tends to be overlooked in calculations of sample size by 
persons inexperienced in survey methods. y > 

If the subdivisions represent classifications by variables such as age, sex, 
income, and years of schooling, the subdivision to vue a person belongs is not 
known until the sample has been taken. Advance samp‘e size estimates can still be 
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made if the proportions 7; of the units that belong to the various subdivisions are 

known. If a simple random sample of size n is selected, the expected size of sample 

from the ith subdivision is nz; The average variance of the mean from this 
subdivision is 

= S? 

vin =2(5) 


Mi 


S2 


nT; ` 


(4.10) 


if nm; is large. Hence we require n = S? /m;V in order to make V(¥i) = V. If this is 
to hold for every subdivision, 
: (8 ) 
n = max (S (4.11) 

If the subdivisions are into classes like age, income, S?/7; may be less than $? for 
central classes, but may be large for an extreme class with small 7;. In this event, 
we may either have to increase the value of V in this subdivision or find some way 
of identifying units in this subdivision in advance so that they can be 
sampled at a higher rate. The method of double sampling (Chapter 12) is 
sometimes useful for this purpose. 

The demands on sample size are still greater in analytical studies in which the 
specifications are 


Viyi-y)sSV (4.12) 
for every pair of subdivisions (domains). In this case 
3 1 (= =) 
= — —— 4+ 
n= max g ae (4.13) 


If the S? are not very different from S°, n will be 2kS?/V when the k domains are 
of equal size, and still greater otherwise. The effect of fpc terms, neglected in this 
discussion, is to reduce the required n’s to some extent. 


4.10 SAMPLE SIZE IN DECISION PROBLEMS 


A more logical approach to the determination of sample size can sometimes be 
developed when a practical decision is to be made from the results of the sample. 
The decision will Presumably be more soundly based if the sample estimate has a 
low error than if it has a high error. We may be able to calculate, in monetary 
terms, the loss /(z) that will be incurred in a decision through an error of amount z 
in the estimate. Although the actual value of z is not predictable in advance, 
sampling theory enables us to find the frequency distribution f(z, n) of z which, 
for a specified sampling method will depend on the sample size n. Hence the 
expected loss for a given size of sample is 


L(n) =| U(z) f(z, n) dz (4.14) 
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The purpose in taking the sample is to diminish this loss. If C() is the cost of a 
sample of size n, a reasonable procedure is to choose n to minimize 


C(n)+L(n) ` (4.15) 


since this is the total cost involved in taking the sample and in making decisions 
from its results. The choice of n determines both the optimum size of sample and 
the most advantageous degree of precision. 

Alternatively, the same approach can be presented in terms of the monetary 
gain that accrues from having the sample information, instead of in terms of the 
loss that arises from errors in the sample information. If monetary gain is used, we 
construct an expected gain G(m) froma sample of size n, where G(n) is zero if no 
sample is taken. We maximize 


G(n)—C(n) 


In this form the principle is equivalent to the rule in classical economics that profit 
is to be maximized. 
The simplest application occurs when the loss function, /(z), is Az?, where A isa 
constant. It follows that 
L(n)=AE(2*) (4.16) 
For instance, if Y is the sample estimate of Y, and z = Y—Y, 


SP PASH ASS 
L(n)=AV(¥)=— —— N 


(4.17) 


if simple random sampling is used. 
The simplest type of cost function for the sample is 


C(n)=co+cın (4.18) 


where co is the overhead cost. By differentiation, the value of n that minimizes 
cost plus loss is 


n=VAS*/c, (4.19) 


A more general form of this result is given by Yates (1960). The same analysis 
applies to any method of sampling and estimation in which the variance of the 
estimate is inversely proportional to n and the cost is a linear function of n. 

Blythe (1945) describes the application of this principle to the estimation of the 
volume of timber in a lot for selling purposes (see exercise 4.11). Nordin (1944) 
discusses the optimum size of sample for estimating potential sales in a market 
that a manufacturer intends to enter. If the sales can be forecast accurately, the 
amount of fixed equipment and the production per unit period can be allocated to 
maximize the manufacturer's expected profit. Grundy et al. (1954, 1956) consider 
the optimum size of a second sample when the results of a first sample are already 
known. z 
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This approach has received substantial further development from workers on 
statistical decision theory. Generalizations include the substitution of utility for 
money value as a scale on which to measure costs and losses, the explicit use of 
subjective prior information about unknown parameters by expressing this infor- 
mation as “prior” probability distributions of the unknown parameters, and the 
investigation of different types of cost and loss functions and of qualitative as well 
as quantitative data. For a comprehensive account of the method, see Raiffa and 
Schlaifer (1961). Although it is still not evident how frequently decision prob- 
lems will be amenable to complete solution by this approach, the method has 
value in stimulating clear thinking about the important factors in a good decision. 
One area that appears suitable for applications is the sampling of lots of articles in 
a mass-production process in order to decide whether to accept or reject the lot on 
the basis of its estimated quality. Sittig (1951) considers the economics of 
sample-size determination, taking account of costs of inspection and the costs 
incurred through defective articles in accepted lots and good articles in rejected 
lots. 


4.11 THE DESIGN EFFECT (Deff) 


With the more complex sampling plans described later in this book, a useful 
quantity is the design effect (deff) of the plan (Kish, 1965). He describes this as the 
ratio of the variance of the estimate obtained from the (more complex) sample to 
the variance of the estimate obtained from a simple random sample of the same 
number of units. The design effect has two primary uses—in sample-size estima- 
tion and in appraising the efficiency of more complex plans. For instance, in 
estimating the proportion of people who possess some attribute, it is. often 
convenient to use the household instead of the person as a sampling unit. As noted 
in Chapter 3, the formula PQ/n cannot be used with these plans. For estimating 
the proportion who had seen a doctor (section 3.12), a simple random sample of 
households gave v(p)=0.00520 as against pq/n =0.00197 for an equal-sized 
simple random sample of persons. An estimate of the deff for this cluster sample 
and this variate is 520/197 = 2.6. When the sampling fractions are small, we can 
therefore estimate sample size by calculating the n (number of persons) needed 
with a simple random sample of persons and multiplying by 2.6. By noting deff 
ratios in this way for the important variates with a complex plan, we can use the 
simple formulas in this chapter for estimating the sample size with the complex 
plan and also judge whether the complex plan is advantageovs in efficiency 
relative to its cost and complexity, 

Estimating the deff from the results of a complex sample may require some 
algebra. We need to show how these results provide, if possible, unbiased 
estimates of S° or of PQ. Examples of these calculations are given for stratified 
random sampling in section 5A.11 and for cluster sampling with clusters of equal 
sizes in section 9.3. 
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EXERCISES 


4.1 In a district containing 4000 houses the percentage of owned houses is to be 
estimated with a s.e. of not more than 2% and the percentage of two-car households with a 
s.e. of not more than 1%, (The figures 2 and 1% are the absolute values, not the cv’s.) The 
true percentage of owners is thought to lie between 45 and 65% and the percentage of 
two-car households between 5 and 10%. How large a sample is necessary to satisfy both 
aims? 

4.2 In the population of 676 petition sheets (Table 2.2, page 28) how large must the 
sample be if the total number of signatures is to be estimated with a margin of error of 1000, 
apart froma 1 in 20 chance? Assume that the value of s given on page 28 is the population 
s? 


43 A survey is to be made of the prevalence of the common diseases in a large 
population. For any disease that affects at least 1% of the individuals in the population, it is 
desired to estimate the total number of cases, with a coefficient of variaton of not more than 
20%. (a) What size of simple random sample is needed, assuming that the presence of the 
disease can be recognized without mistakes? (b) What size is needed if total cases are 
wanted separately for males and females, with the same precision? 

4.4 Ina wireworm survey the number of wireworms per acre is to be estimated with 
a limit of error of 30%, at the 95% probability level, in any field in which wireworm 
density exceeds 200,000 per acre in the top 5in. of soil. The sampling tool measures 

9x9x5 in. deep. Assuming that the number of wireworms in a single sample follows a 
distribution slightly more variable than the Poisson, we take $? = 1.2 Y. What size of simple 
random sample is needed? (1 acre = 43,560 sq ft.) 

4.5 The following coefficients of variation per unit were obtained in a farm survey in 
Iowa, the unit being an area 1 mile square (data of R. J. Jessen): 


Estimated cv 


Item (%) 
Acres in farms 38 
Acres in corn 39 
Acres in oats 44 
Number of family workers 100 
Number of hired workers 110 
Number of unemployed 317 


A survey is planned to estimate acreage items wi 
(excluding unemployed) with a cv of 5%. 
are needed? How well would this sample 
ployed? 

4.6 By experimental sampling, the mean value of a random variate is to be estimated 
with variance V = 0.0005. The values of the random variate for the first 20 samples drawn 

-are shown on p. 87. How many more samples are needed? (Use equation 4.7.) 

4.7 A household survey is designed to estimate the proportion of families possessing 
certain attributes. For the principal items of interest, the value of P is expected to lie 
between 30 and 70%. With simple random sampling, how large are the values of n 
necessary to estimate the following means with a standard error not exceeding 3%? (a) The 


ns with a cv of 2}% and numbers of workers 
With simple random sampling, how many units 
be expected to estimate the number of unem- 
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Sample Value of Sample Value of 
Number Random Variate Number Random Variate 
1 0.0725 11 0.0712 
2: 0.0755 12 0.0748 
3 0.0759 13 0.0878 
4 0.0739 14 0.0710 
5 0.0732 15 0.0754 
6 0.0843 16 0.0712 
7 0.0727 17 0.0757 
8 0.0769 18 0.0737 
9 0.0730 19 0.0704 
10 0.0727 20 0.0723 


over-all mean P? (b) The individual means P, for the income classes—under $5000; $5000 
to $10,000; over $10,000. (i = 1, 2, 3)? (c) The differences between the means (P, — P;) for 
every pair of the classes in (b)? Give a separate answer for (a), (b), and (c). Income statistics 
indicate that the proportions of families with incomes in the three classes above are 50, 38, 
and 12%. 

4.8 The 4-year colleges in the United States were divided into classes of four different 
sizes according to their 1952-1953 enrollments. The standard deviations within each class 
are shown below. 


Class 


1 2 3 4 


Number of students <1000 1000-3000 3000-10,000 over 10,000 
Si 236 625 2008 10,023 


If you know the lass boundaries but not the values of S; how well can you guess the $, 
values by using simple mathematical figures (section 4.7)? No college has less than 200 
students and the largest has about 50,000 students. 

4.9 With a quadratic loss function and a linear cost function, as in section 4.10, S? is 
reduced to S” by a superior sampling plan, co, c,, and A remaining unchanged. If n', V' 
denote the new optimum sample size and the accompanying V(Y), show that n'<n and 
that V’< V. 

4.10 Ifthe loss function due to an error in f is A |f — F| andif the cost C = co + cın, show 
that with simple random sampling, ignoring the fpc, the most economical value of n is 


( AS 2/3 
V2 

4:11 (Adapted from Blythe, 1945), The selling price of a lot of standing timber is UW, 
where U is the price per unit volume and W is the volume of timber on the lot, The number 


N of logs on the lot is counted, and the average volume per log is estimated from a simple 
random sample of n logs. The estimate is made and paid for by the seller and is 
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provisionally accepted by the buyer. Later, the buyer finds out the exact volume purchased, 
and the seller reimburses him if he has paid for more than was delivered. If he has paid for 
less than was delivered, the buyer does not mention the fact. 

Construct the seller’s loss function. Assuming that the cost of measuring n logs is cn, find 
the optimum value of n. The standard deviation of the volume per log may be denoted by $ 
and the fpc ignored. 

4.12 (a) The presence or absence of each of two characteristics is to be measured on 
each unit in a simple random sample from a large population. If P:. P2 are the percentages 
of units in the population that possess characteristics 1 and 2, a client wishes to estimate 
(P,— P2) with a standard error not exceeding two percentage points. What sample size do 
you suggest if the client thinks that P, and P, both lie between 40 and 60% and that the 
characteristics are independently distributed on the units? 7 

(b) Suppose that in (a) the client thinks that the characteristics are positively correlated, 
but does not know the correlation. You‘suggest an initial sample of 200, with the following 
results. 


Characteristics 
1 2 Number of units 
Yes Yes 72 
Yes No 44 
No Yes 14 
No No 70 


200 


What sample size do you now recommend to estimate (P, — P,) with a standard error 2%? 


4.13 (a) Suppose ydu are estimating the sex ratio, which is close to equality, and could 
sample households of four persons father, mother, two children. Ignoring the small 
proportion of families with identical twins, find the deff factor for a simple random sample 
of n households versus one of the 4n persons. 

(b) Would identical twin families lower or raise the deff factor? 


3 


CHAPTER 5 


Stratified Random Sampling 


5.1 DESCRIPTION 


In stratified sampling the population of N units is first divided into subpopula- 
tions of N;, N2,..., Nz units, respectively. These subpopulations are nonover- 
lapping, and together they comprise the whole of the population, so that 


N,+Not++::+NL=N 


The subpopulations are called strata. To obtain the full benefit from stratification, 
the values of the N, must be known. When the strata have been determined, a 
sample is drawn from each, the drawings being made independently in different 
strata. The sample sizes within the strata are denoted by ,, m2,..., mz, respec- 
tively. 

If a simple random sample is taken in each stratum, the whole procedure is 
described as stratified random sampling. 

Stratification is a common technique. There are many reasons for this; the 
principal ones are the following. 


1. If data of known precision are wanted for certain subdivisions of the 
population, it is advisable to treat each subdivision as a “population” in its own 
right. 

2. Administrative convenience may dictate the use of stratification; for exam- 
ple, the agency conducting the survey may have field offices, each of which can 
supervise the survey for a part of the population. 

3. Sampling problems may differ markedly in different parts of the population. 
With human populations, people living in institutions (e.g., hotels, hospitals, 
prisons) are often placed in a different stratum from people living in ordinary 
homes because a different approach to the sampling is appropriate for the two 
situations. In sampling businesses we may possess a list of the large firms, which 
are placed in a separate stratum. Some type of area sampling may have to be used 
for the smaller firms. 

4. Stratification may produce a gain in precision in the estimates of characteris- 
tics of the whole population. It may possible to divide a heterogeneous population 
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into subpopulations, each of which is internally homogeneous. This is suggested 
by the name strata, with its implication of a division into layers. If each stratum is 
homogeneous, in that the measurements vary little from one unit to another, a 
precise estimate of any stratum mean can be obtained from a small sample in that 
stratum. These estimates can then be combined into a precise estimate for the 
whole population. 


The theory of stratified sampling deals with the properties of the estimates from 
a stratified sample and with the best choice of the sample sizes n, to obtain 
maximum precision. In this development it is taken for granted that the strata 
have already been constructed. The problems of how to construct strata and of 
how many strata there should be are postponed to a later stage (section 5A.7). 


5.2 NOTATION 


The suffix h denotes the stratum and i the unit within the stratum. The notation 


is a natural extension of that previously used. The following symbols all refer to 
stratum h. 


N, total number of units 
Mh number of units in sample 
Yhi value obtained for the ith unit 
N; 
W, = N stratum weight 
n 3 Saki 
fi ENA sampling fraction in the stratum 
Ny 
& Lym 
Yp, = N, true mean 
h 
na 
2 Yni 
f= sample mean 
nh, 
Nn 
È (yn Yn) 
i=1 : 
Sy = ays T true variance 
AS 


Note that the divisor for the variance is (N, — 1). 
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5.3 PROPERTIES OF THE ESTIMATES 


For the population mean per unit, the estimate used in stratified sampling is y,, 
(st for stratified), where 


Ýa =} = E Win (5.1) 


where N=N,+N2+: +N. 
The estimate fs is not in general the same as the sample mean. The sample 
mean, J, can be written as 


L æ 
PEDI 
y=" (5.2) 


The difference is that in y,, the estimates from the individual strata receive their 
correct weights N,,/N. It is evident that f coincides with j,, provided that in every 
stratum 


mn Nn Tie = 

FIEND SHAM, TIN odds TH 

This means that the sampling fraction is the same in all strata. This stratification is 
described as stratification with proportional allocation of the n),. It gives a 
self-weighting sample. If numerous estimates have to be made, a self-weighting 
sample is time-saving. 

The principal properties of the estimate y,, are outlined in the following 
theorems. The first two theorems apply to stratified sampling in general and are 
not restricted to stratified random sampling; that is, the sample from any stratum 
need not be a simple random sample. 


Theorem 5.1. Ifin every stratum the sample estimate j;, is unbiased, then fsris 
an unbiased estimate of the population mean Y. 
Proof. 
E L L =; 
E(¥q:)=E p> Wh = Ra Wha Yh 
=i z 


since the estimates are unbiased in the individual strata. But the population mean 
Y may be written 


LN, fA 
o 2 E Yn- L Na¥n L = 
p= hn = WY 
N N h=1 


This completes the proof. 
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Thevurem 5.2. If the samples are drawn independently in different strata, 
L 
Vise) = Z Wii Vn) (5.3) 
=1 


where V(¥;,) is the variance of y, over repeated samples from stratum h. 
Proof. Since 


L 
Ja = 2 Wyn (5.4) 
=1 


Ys is a linear function of the y, with fixed weights W,. Hence we may quote the 
result in statistics for the variance of a linear function. 


ja L L 
Vu) = È WVO) +2 X 2 WwW, Cov (J1,9}) (5.5) 
=1 =] j> 


But since samples are drawn-independently in different strata, all covariance 
terms vanish. This gives the result (5.3). 

To summarize theorems 5.1 and 5.2: if f, is an unbiased estimate of Y, in every 
stratum, and sample selection is independent in different strata, then Js is an 
unbiased estimate of Y with variance $ W, V(j;,). 

The important point about this result is that the variance of fẹ, depends only on 
the variances of the estimates of the individual stratum means Y,. If it were 
possible to divide a highly variable population into strata such that all items had 
the same value within a stratum, we could estimate Y without any error. Equation 
(5.4) shows that it is the use of the correct stratum weights N,/N in making the 
estimate y,, that leads to this result. 


Theorem 5.3. For stratified random sampling, the variance of the estimate ,, 
is 


ERE SEES 
Vja) == = = = 
(Yur) N? NAON: ny) n, 2, Wy ™ =f) (5.6) 


Proof. Since ¥, is an unbiased estimate of Y,, theorem 5.2 can be applied. 
Furthermore, by theorem 2.2, applied to an individual stratum, 


_,_ SENSEN 
Vn) a 


By substitution into the result of theorem 5.2, we obtain 


2 
25h 


nh 


TOE 7 TAL S,2 23 
VO) = p22, Ne Vn) =D NaN — my) = EW Of 


Some particular cases of this formula are given in the following corollaries. 
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Corollary 1. If the sampling fractions n/N), are negligible in all strata, 
1 Ni Sh o Wi Si 
VIe ants a EN 5. 
Ou) = Hed mE x = (5.7) 
This is the appropriate formula when finite population corrections can be ignored. 


Corollary 2. With proportional allocation, we substitute 


ny =the 
STN 
in (5.6). The variance reduces to 
aa N SEN LIT 2 i 
VODEN ANN a E WS, (5.8) 


Corollary 3. If sampling is proportional and the variances in all strata have the 
same value, S,°, we obtain the simple result 


$2(N- 
Vu) -$ =) (5.9) 


Theorem 5.4. If Y,,=Ny,, is the estimate of the population total Y, then 


2 


VENN, =n) (5.10) 


h 


This follows at once from theorem 5.3. 


Example. Table 5.1 shows the 1920 and 1930 number of inhabitants, in thousands, of 
64 large cities in the United States. The data were-obtained by taking the cities which 
ranked fifth to sixty-eighth in the United States in total number of inhabitants in 1920. The 
cities are arranged in two strata, the first containing the 16 largest cities and the second the 
remaining 48 cities. 

The total number of inhabitants in all 64 cities in 1930 is to be estimated from a sample of 
size 24. Find the standard error of the estimated total for (1) a simple random sample, (2) a 
stratified random sample with proportional allocation, (3) a stratified random sample with 
12 units drawn from each stratum. 

This population resembles the populations of many types of business enterprise in that 
some units—the large cities—contribute very substantially to the total and display much 
greater variability than the remainder. 


The stratum totals and sums of squares are given under Table 5.1. Only the 1930 data are 
used in this example: the 1920 data appear in a later example. 
For the complete population in 1930, we find 
Y=19,568, S?=52,448 


The three estimates of Y are denoted by Yrany prop ANd Yequat 
1. For simple random sampling 
262 = 2 
N'S? N n_ (64) e248) = 5,594,453 


V Fran) =F N 24  \64 
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TABLE 5.1 


Sizes oF 64 Cities (1N 1000’s) In 1920 AND 1930 
1920 Size (x, ;) 1930 Size (y,,) 


Stratum Stratum 
h=1 2 1 2 


300 | 364 209 113 
822 317 183 115 
781 328 163 123 
805 302 253 154 
670 288 232 140 
1238 291 260 119 
573 253 201 130 
634 291 147 127 
578 308 292 100 
487 272 164 107 
442 284 143 114 
451 255 169 111 
459 270 139 163 
464 214 170 116 
400 195 150 122 
366 260 143 134 


Note. Cities are arranged in the same order in both years. 


Totals and sums of squares 
1920 1930 


> (Eri) D (Er?) > (Yai) > Gn?) 
8,349 4,756,619 | 10,070 7,145,450 
7,941 1,474,871 9,498 2,141,720 


Stratum 


1 
2 


from theorem 2.2, corollary 2. The standard error is 
o( Yran) = 2365 
2. For the individual strata the variances are 


S,? = 53,843, S.?=5581 


Note that the stratum with the largest cities has a variance nearly 10 times that of the other 
stratum. 
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In proportional allocation, we have n, = 6, n.= 18. From (5.7), multiplying by N?, we 
have 
N-n 


V( Prop) = INS: 


= ${(16)(53,843) + (48)(5581)] = 1,882,293 
a( Êro) = 1372 
3. For n, =n,= 12 we use the general formula (5.9): 


V( Peat) =E Na (Ny) 


_ (16)(4)(53,843) , (48)(36)(S581) _ 
== A Or Oe = 1,090,827 


o(Yequat) = 1044 


In this example equal sample sizes in the two strata are more precise than proportional 
allocation. Both are greatly superior to simple random sampling. 


5.4 THE ESTIMATED VARIANCE AND 
CONFIDENCE LIMITS 


If asimple random sample is taken within each stratum, an unbiased estimate of 
S,’ (from theorem 2.4) is 


Tige 
ye E (ym —Fn)? (5.11) 
Rh i=1 
Hence we obtain the following. 
Theorem 5.5. With stratified random sampling, an unbiased estimate of the 
variance of F, is 
= c= ye Uag s 
2 (Vas) =S Gu) = Fa L Na(Nn— m) (5.12) 
h=1 ny, 


An alternative form for computing purposes is 


(5.13) 


The second term on the right represents the reduction due to the fpc. 

In order to compute this estimate, there must be at least two units drawn from 
every stratum. Estimation of the variance when stratification is carried to the point 
at which only one unit is chosen per stratum is discussed in section 5A.12. 

The formulas for confidence limits are as follows. 


Population mean: Ja = ts (Fur) (5.14) 
Population total: Nor + tN (Yor) (5.15) 
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These formulas assume that J, is normally distributed and that s(j,,) is well 
determined, so that the multiplier t can be read from tables of the normal 
distribution. 

If only a few degrees of freedom are provided by each stratum, the usual 
procedure for taking account of the sampling error attached to a quantity like. 
5(Ys:) is to read the t-value from the tables of Student’s t instead of from the 
normal table. The distribution of s(¥,,) is in general too complex to allow a strict 
application of this method. An approximate method of assigning an effective 
number of degrees of freedom to s(¥,,) is as follows (Satterthwaite, 1946). 

We may write 


Syste =. 1 Ł 2 _Ni(Ni =n) 
sO) =H BH » where =" 
The effective number of degrees of freedom n, is 
2 
(x gsi?) 
= (5.16) 
zr En Sh 
n,—1 


The value of n, always lies between the smallest of the values (n, — 1) and their 
sum. The approximation takes account of the fact that S,? may vary from stratum 
to stratum. It requires the assumption that the y,, are normal, since it depends on 
the result that the variance of 5, is 20,*/(n,—1). If the distribution of ya; has 
positive kurtosis, the variance of 5,” will be larger than this and formula 5.16 
overestimates the effective degrees of freedom. 


5.5 OPTIMUM ALLOCATION 


In stratified sampling the values of the sample sizes n, in the respective strata 
are chosen by the sampler. They may be selected to minimize V(j,,) for a specified 
cost of taking the sample or to minimize the cost for a specified value of V (Fs). 

The simplest cost function is of the form 


cost = C=co+¥ can, (5.17) 


. Within any stratum the cost is proportional to the size of sample, but the cost per 
unit c, may vary from stratum to stratum. The term co represents an overhead 
cost. This cost function is appropriate when the major item of cost is that of taking 
the measurements on each unit. If travel costs between units are substantial, 
empirical and mathematical studies suggest that travel costs are better rep- 
resented by the expression Et, Vn, where t, is the travel cost pet unit (Beardwood 
et al., 1959). Only the linear cost function (5,17) is considered here. 
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Theorem 5.6. In stratified random sampling with a linear cost function of the 
form (5.17), the variance of the estimated mean y,, is a minimum for a specified 
cost C, and the cost is a minimum for a specified variance V(j,,), when n, is 
proportional to W,,S;,/V Ch- 

Proof. We have 


L 
C= Cot È Cutty (5.17) 
> L Wie Si LOWS L Wi Si? 
V=V(Fu) = yh" (1 -f,) = Ş jaa MEL g (5.18) 
kzı A h=1 "h he1 Nh 4 


Our problems are either (1) to choose the ny so as to minimize V for specified C, or 
(2) to choose the n, so as to minimize C for specified V. It happens that apart from 
their final steps, the problems have the same solution. Choosing the n, to 
minimize V for fixed C or C for fixed V are both equivalent to minimizing the 
product 

Wir Sh 


vie=(v+5 AS) (c-a) 


=(y Wess (5 cum) (5.19) 


Stuart (1954) has noted that (5.19) may be minimized neatly by use of the 
Cauchy-Schwarz inequality. If a}, bą are two sets of L positive numbers, this 
inequality comes from the identity 


2 
(Za?)(2o)-(Z ab) =E E (at-a? (5.20) 
i j>i 
It follows from (5.20) that 
2 3 5 
(z a, \(z Bi?) = (z arbi) (5.21) 
equality occurring if and only if b,/a,, is constant for all h. In (5.19) take 

Wi Sh 


On ==, dn =V nM — dn = WpSh Vcn 
Vn, 


The inequality (5.21) gives 
ae- (z w (z ih) ni (z a?) (z 2) z (z WS, ea) 


Nh 


2 
Thus, no choice of the n, can make V'C' smaller than (z WSV) . The 
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minimum value occurs when 


b,  myV Cy 
pal EL DS tant 522 
a, WS, constan: (5.22) 


as stated in the theorem. 
In terms of the total sample size n, in a stratum, we have 
nn WaSa/ Ven NpSp/ Nen 
EA F (5.23) 
n E(WaSa/ Ve) EN Sa/e) 


This theorem leads to the following rules of conduct. In a given stratum, take a 
larger sample if 


1. The stratum is larger. 
2. The stratum is more variable internally. 
3. Sampling is cheaper in the stratum. 


One further step is needed to complete the allocation. Equation (5.23) gives the 
n, in terms of n, but we do not yet know what value n has. The solution depends on 
whether the sample is chosen to meet a specified total cost C or to give a specified 
variance V for J. If cost is fixed, substitute the optimum values of n, in the cost 
function (5.17) and solve for n. This gives 


n = LEX 60) X (NaSh/ Vcn) 
DNS Vcr) 
If V is fixed, substitute the optimum n, in the formula for V (Fs). We find 


(E MS Va)E W,S,/a 


"VEINS WS? oz) 


(5.24) 


where W, = N,/N. ` 
An important special case arises if c} = c, that is, if the cost per unitis the same in 
all strata. The cost becomes C= cy+cn, and optimum allocation for fixed cost 


teduces to optimum allocation for fixed sample size, The result in this special case 
is as follows. 


Theorem 5.7. In stratified random sampling V(j,,) is minimized for a fixed 
total size of sample n if 


W,Sh N,S; ` 
n, = Na = phn (5.26) 
AL WaS "ENS, 
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This allocation is sometimes called Neyman allocation, after Neyman (1934), 
whose proof gave the result prominence. An earlier proof by Tschuprow (1923) 
was later discovered. 

A formula for the minimum variance with fixed n is obtained by substituting the 
value of n, in (5.26) into the general formula for V(¥,,). The result is 


2 
(x ws) a 
Vmin (Yer) = a is IMs (5.27) 


The second term on the right represents the fpc: 


5.6 RELATIVE PRECISION OF STRATIFIED 
RANDOM AND SIMPLE RANDOM SAMPLING 


If intelligently used, stratification nearly always results in a smaller variance for 
the estimated mean or total than is given by a comparable simple random sample. 
It is not true, however, that any stratified random sample gives a smaller variance 
than a simple random sample. If the values of the n, are far from optimum, 
stratified sampling may have a higher variance. In fact, even stratification with 
optimum allocation for fixed total sample size may give a higher variance, 
although this result is an academic curiosity rather than something likely to 
happen in practice. 

In this section a comparison is made between simple random sampling and 
stratified random sampling with proportional and optimum allocation. This 
comparison shows how the gain due to stratification is achieved. 

The variances of the estimated means are denoted by Vyan, Vprop, and Vip 
respectively. ` . 

Theorem 5.8. If terms in 1/N, are ignored relative to unity, 

Von = Vorop = Vean (5.28) 


where the optimum allocation is for fixed n, that is, with n, 0C.N,S),. 
Proof. 


Ss? 

Vian =- (5.29) 

1 WS? E Wh Si? 
Vr = 2D 5 Wy? => aS LMS (5.30) 

[from equation (5.8), section 5.3] 
2 
h (z msi) L WS 

Meee Fp gies RETN (5.31) 


[from equation (5.27), section 5.5] 
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From the standard algebraic identity for the analysis of variance of a stratified 
population, we have 


(N- Ds- Onu- YY 
=X (Yai — Yi) +E Na (Ya ý’ 
=E (N, =D% +E N, Y,- Y? (5.32) 


If terms in 1/N, are negligible and hence also in 1/N, (5.32) gives 
S?=Y W,S, +E Wi Yn- YY (5.33) 
Hence 
S?_ (= lS renee? 
Vn ==) = FD ms Ps WK FF 534) 


= Vion + PI Wi, (¥n— YÒ (5.35) 


By the definition of Vsp we must have Vpop = Vop,. By (5.30) and (5.31) their 
difference is 


1 
Vero ap T WS, -(D W,Sh)"] 


1 fy 
=z [E W,(S,-5)"] (5.36) 


where $=), W,,S, is a weighted mean of the Sp. 
From (5.35) and (5.36), with terms in 1/N, negligible, 


Van = Vat E W(S,-SP+F—Dy ma- 6an 


To summarize, in equation (5.37) there are two components of the decrease in 
variance as we change from simple random sampling to optimum allocation. The 
first component (term on the extreme right) comes from the elimination of 
differences among the stratum means; the second (middle term on the right) 
comes from elimination of the effect of differences among the stratum standard 
deviations. The second component represents the difference in variance between 
optimum and proportional allocation. 

If terms in 1/N, are not negligible, substitution for S$? from (5.32) leads to 


a-f) a i 
Van = Vro tN È N, (Yn = P- FEN-NDS] (5.38) 


instead of to (5.35). 
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It foliows that proportional stratification gives a higher variance than simple 
random sampling if 


ENP,- <E (N-Na) 6.39) 


Mathematically, this can happen. Suppose that the S,” are all equal to S,,”, so that 
proportional allocation is optimum in the sense of Neyman. Then (5.39) becomes 
DN,(¥;,— ¥)?<(L-1)S,,? 
or 
DNAYA= Yo 2 
_———- <5, 
itil 
Those familiar with the analysis of variance will recognize this relation as implying 


that the mean square among strata is smaller than the mean square within strata, 
that is, that the F-ratio is less than 1. = 


(5.40) 


5.7 WHEN DOES STRATIFICATION PRODUCE 
LARGE GAINS IN PRECISION? 


The ideal variate for stratification is the value of y itself—the quantity to be 
measured in the survey. If we could stratify by the values of y, there would be no 
overlap between strata, and the variance within strata. would be much smaller than 
the over-all variance, particularly if there were many strata. This situation is 
illustrated by the example in section 5.3, page 94. The population consisted of the 
sizes (numbers of inhabitants) of 64 cities in 1930, stratified by size. Although 
there were only two strata, proportional stratification reduced the s.e. (Y) from 
2365 to 1372. Stratification with n; = n3 = 12, which is optimum under Neyman 
allocation, produced a further reduction to 1044. 

In practice, of course, we cannot stratify by the values of y. But some important 
applications come close to this situation, and therefore give large gains in 
precision, by satisfying the following three conditions. 


1. The Population is composed of institutions varying widely in size. 

2. The principal variables to be measured are closely related to the sizes of the 
institutions. 

3. A good measure of size is available for setting up the strata. 


Examples are businesses of a specific kind, for example, groceries (in surveys 
dealing with the volume of business or number of employees), schools (in surveys 
related to numbers of pupils), hospitals (in studies of patient load), and income tax 
returns (for items highly correlated with taxable income). In the United States 
farms also vary greatly in size as measured by total acreage or gross income, but 
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common farm items, such as the production of particular crops or types of 
livestock, often exhibit only a moderate correlation with farm size, so that the 
gains from stratification by farm size are not huge. 

If the size of the institution remains stable through time, at least for short 
periods, then its best practical measure is usually the size of the institution on some 
recent occasion when a census was taken. The example in section 5.3 illustrates 
the situation in which good previous data are available. Table 5.2 shows the S, and 
the resulting optimum n, °C N,5;, when the allocation is made from 1920 and 
1930 data, respectively. 

The 1920 data indicate an n; of 11.56, as against a “true” optimum of 12.21 for 
the 1930 data. When rounded to integers, both sets of data give the same 
allocation—a sample size of 12 from each stratum. 


TABLE 5.2 
CALCULATION OF THE OPTIMUM ALLOCATION 
1920 Data 1930 Data 
Stratum Ny Sh NSh Sh NSh ny 


163.30 2612.80 11.56 
58.55 2810.40 12.44 


i 5423.20 24.00 


Note that the optimum sampling fraction is 75% in stratum 1 but only 25% in 
stratum 2. It is often found that because of the high variability of the stratum 
consisting of the largest institutions, the formula calls for 100% sampling in this 
stratum. Indeed, the allocation may call for more than 100% sampling (see section 
5.8). Note also that the S, are smaller in 1920 than in 1930. The 1920 data give an 
overoptimistic impression of the precision to be obtained in a 1930 survey. As 
mentioned in section 4.7, the possibility of a change in the levels of the S, should 
always be considered when using past data, even though an allowance for change 
may have to be something of a guess. 

Geographic stratification, in which the strata are compact areas such as counties 
or neighbourhoods in a city, is common—often for administrative convenience or 
because separate data are wanted for each stratum. It is usually accompanied by 
some increase in precision because many factors operate to make people living or 
crops growing in the same area show similarities in their principal characteristics. 
The gains. from geographic stratification, however, are generally modest. For 
example, Table 5.3 shows data published by Jessen (1942) and Jessen and 
Houseman (1944) on the effectiveness of geographic stratification for a number of 

typical farm economic items. 


232.04 3712.64 
74.71 3586.08 11.79 


12.21 


7298.72 24.00 
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Four sizes of stratum are represented—the township, the county, the “type of 
farming” area, and the state. To give some idea of the relative sizes of the strata, 
there are about 1600 townships, 100 counties, and 5 areas in Iowa. 

In the table the precision of a method of stratification is taken as inversely 
proportional to the value of V(¥,,) given by the method. Thus the relative 
precision of method 1 to method 2 is the ratio V2(¥,;)/ Vı(Fs:), expressed as a 
percentage. The data shown are averages over the numbers of items given in the 
second column. The county is taken as a standard in each case. As indicated, the 
gains in precision are moderate. In Iowa the use of 1600 strata (townships) 
compared with no stratification (state) increases the precision by about 30%; that 
is, it reduces the variance by about 25%. 


TABLE 5.3 


RELATIVE PRECISION OF DIFFERENT KINDS OF GEOGRAPHIC 
STRATIFICATION (IN PER CENT) 


Stratum 
Type of 
No. of Farming 
State Items Township County Area State 
Pee ae ee ey Pn ee ee 
Towa, 1938 18 115 100 96 91 
Towa, 1939 19 121 100 97 91 
Florida, 1942 
Citrus fruit area 14 144 100 
Truck farming area 15 111 100 S 
California, 1942 17 113 100 97 


As regards proportional versus optimum stratification, there are two situations 
in which optimum stratification wins handsomely. The first is the case, already 
discussed, in which the population consists of large and- small institutions, 
stratified by some measure of size. The variances S;? are usually much greater for 
the large institutions than for the small, making proportional stratification ineffi- 
cient. The second situation is found in surveys in which some strata are much more 
expensive to sample than others, The influence of the factors Ven may make 
proportional allocation poor. 

When planning an allocation in which the estimated n, do not differ greatly 
from proportionality, it is worthwhile to estimate how much larger V(¥,.) Or 
V(¥,,) become if proportional allocation is used. The optima in the allocation 
problem are rather flat (see section 5A.2) and the increase in variance may turn 
out surprisingly small. Moreover, the superiority of the optimum, as computed 
from estimated values of the S,, is always exaggerated because of the errors in the 
estimated Sp. The simplicity and the self-weighting feature of proportional 
allocation are probably worth a 10 to 20% increase in variance. 
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5.8 ALLOCATION REQUIRING MORE THAN 100 
PER CENT SAMPLING 


As mentioned in section 5.7, the formula for the optimum may produce an n, in 
some stratum that is larger than the corresponding N,. Consider the example on 
city sizes in section 5.3. A sample of 24 cities, distributed between two strata, 
called for 12 cities out of 16 in the first stratum and 12 out of 48 in the second. Had 
the sample size been 48, the allocation would demand 24 cities out of 16 in the first 
stratum. The best that can be done is to take all cities in the stratum, leaving 32 
cities for the second stratum instead of the 24 postulated by the formula. This 
problem arises only when the over-all sampling fraction is substantial and some 
strata are much more variable than others. It has occurred in practice on several 
occasions. 


If the original allocation gives nı >N, when there are more than two strata, the 
optimum revised allocation is: 


m=; ny, =(n—N\)———., (h=2) (5.41) 


provided that ñ, = N, for h =2. If it should happen that ñ, >N, we change the 
allocation to 


ñ=N; =N; ñ, =(n-N,-N3)7 »  (h=3) (5.41) 


provided that ñ, = N, for h =3. We continue this process until every ñ, = N. The 
resulting allocation may be shown to be optimum for given n, as would be 
expected. 

Care must be taken to use the correct formula for V(¥,.). The general formula 
(5.6) in section 5.3 is correct if the fi, 


€ given by the revised optimum allocation 
are substituted. Formula (5.27) for Vin (Ys) 


Vipin (Fg) = EWS) _E WS 
nin Oh - d 


(5.27) 
no longer holds. If }' denotes summation over the strata in which fi, < Np, an 
alternative correct formula is 

š WpS) Y 2 
Van Fu) = EWS? E Wy? (5.42) 


n N 


where n’ is the revised total sample size in these strata. 


—S 
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5.9 ESTIMATION OF SAMPLE SIZE WITH 
CONTINUOUS DATA 


Formulas for the determination of n under an estimated optimum allocation 
were given in section 5.5. The present section presents formulas for any alloca- 
tion, with some useful special cases. It is assumed that the estimate has a specified 
variance V. If, instead, the margin of error d (section 4.4) has been specified, 
V = (d/t), where t is the normal deviate corresponding to the allowable probabil- 
ity that the error will exceed the desired margin. 


Estimation of the Population Mean Y 
Let s, be the estimate of S, and let n, = w,n where the w, have been chosen. In 
these terms the anticipated V(y,,) (from theorem 5.3, section 5.3) is 


poly Wis’ lewy? ; 
yo 2 h NÈ Wish (5.43) 
with W, = N,/N. This gives, as a general formula for n, 
Wisk 
zma 
n= EEN R (5.44) 
Vto Wash 


If the fpc is ignored, we have, as a first TE Mp i 


no= Ee ws (5.45) 
If no/N is not negligible, we may calculate n as 
A E ARAL (5.46) 


1 2 
EE 
Lege Wes 


In particular cases the formulas take various forms that may be more conve- 
nient for computation. A few are given. 
Presumed optimum allocation (for fixed n): wp © WhSn- 


ta Wasn)? 


(5.47) 
V+— Zr Wish 
Proportional allocation: w, = W, = N,/N. 
2 n 
-LM n= (5.48) 
1+2 
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Estimation of the Population Total 


If Vis the desired V(¥,,), the principal formulas are as follows 
General: 


Nas 


Dy 
ee Wh 
VEE Nis G9) 


Presumed optimum (for fixed n): 


nes ($, Nasr) 


n 


VHE Nps oy 
Proportional: 
n Natt — nO (5.51) 
pee 
N 


Example. This example comes from a paper by Cornell (1947), which describes a 
sample of United States colleges and universities drawn in 1946 by the U.S, Office of 
Education in order to estimate enrollments for the 1946-1947 academic year, The 
illustration is for the population of 196 teachers’ colleges and normal schools, These were 
arranged in seven strata, of which one small stratum will be ignored. The first five strata 
were constructed by size of institution: the sixth contained colleges for women only. 
Estimates s, of the S, were computed from results for the 1943-1944 academic year. An 
“optimum,” stratification based on these Sa Was employed. 

The objective was a coefficient of variation of 5% in the estimated total enrollment. In 


1943 the total enrollment for this group of colleges was 56,472. Thus the desired standard 
error is 


(0.05)(56,472) = 2824 


so that the desired variance is 
V= (2824)? = 7,974,976 


It may be objected that enrollments will be greater in 1946 than in 1943 and that - 
allowance should be made for this increase. Actually, the calculation assumes only that the 
` cv per college remains the same in 1943 and 1946—an assumption that may not be 
unreasonable. 
etic were known before determining n. 
-90), which applies to an “opti W tion for 
estimating a total. With onl fi SE LAER 


eligible. However, for staon aion, itisimprobable that the fpe willbe 
He : for purposes of illustration; a first a AERA i atia eo 
be sought. This is pproximation ignoring pi 


i -E Ns)? _ (26,841)? _ 
a Aas Se 
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TABLE 5.4 
DATA FOR ESTIMATING SAMPLE SIZE 

Stratum Ny Sh NrSp My 
1 13 325 4,225 9 

2 18 190 3,420 7 

3 26 189 4,914 10 

4 42 82 3,444 7 

5 73 86 6,278 13 

6 24 190 4,560 10. 
Totals 196 26,841 56 


Adjustment is obviously needed. For the correct n in (5.50), we have 


tage 9084 A Y 
4 ORN 3 14 4:640,387 97 
Voor 7,974,976 


A sample size of 56 was chosen.* The m, for individual strata appear in the right-hand 
column of Table 5.4. 


5.10 STRATIFIED SAMPLING FOR PROPORTIONS 


If we wish to estimate the proportion of units in the population that fall into 
some defined class C, the ideal stratification is attained if we can place in the first 
stratum every unit that falls in C, and in the second every unit that does not. 
Failing this, we try to construct strata such that the proportion in-class C varies as 
much as possible from stratum to stratum. 

Let 
Ap, a 


P=- Ph = 


Nh 


be the proportions of units in C in the Ath stratum and in the sample from that 
stratum, respectively. For the proportion in the whole population, the estimate 
appropriate to stratified random sampling is 


N; 
A hse (5.52) 


Theorem 5.9. With stratified random sampling, the variance of p,, is 


1 2(N, — n) P,Q, 
VPs) =r A = ae a 3 


* The arithmetical results differ slightly from those given by Cornell (1947). 


(5.53) 
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Proof. This is a particular case of the general theorem for the variance of the 
estimated mean. From theorem 5.3 


= 1 Si 
V (Fs) =N N, (N, o (5.54) 


Let y; be a variate which has the value 1 when the unit is in C, and zero otherwise. 
In section 3.2, equation 3.4, it was shown that for this variate 


Ny 
Sk = N,- q PrO (5.55) 
This gives the result. 


Note. In nearly all applications, even if the fpc is not negligible, terms in 1/N, 
will be negligible, and the slightly simpler formula 


Vo= BENN m) Any ERA) (5.56) 
can be used. 
Corollary 1. When the fpc can be ignored, 
V(Ps) =X ane (5.57) 


Corollary 2. With proportional allocation, 


N-n 1 Nie P,Q, 


VOOR ant aN N =T N,-1 


(5.58) 


= ity WiPQh (5.59) 


For the sample estimate of the variance, substitute p,q,/(m,—1) for the 
unknown P;,Q,,/m), in any of the formulas above. 


The best choice of the n, in order to minimize V(p,,) follows from the general 
theory in section 5.5. 


Minimum Variance for Fixed Total Sample Size. 


ny, N,V Nu/ (Ni — 1)V PQ; = N,V P,Q, 


yy Ny P,Q, 


(5.60) 


Np, 
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Minimum Variance for Fixed Cost, where Cost = co+) nnn: 


nm =n Nid PrQu/ Ch (5.61) 
È N, Y PQh/ Ch s 


The value of n is found as in section 5.5. 


5.11 GAINS IN PRECISION IN STRATIFIED 
SAMPLING FOR PROPORTIONS 


If the costs per unit are the same in all strata, two useful working rules are that 
(a) the gain in precision from stratified random over simple random sampling is 
small or modest unless the P, vary greatly from stratum to stratum, and (b) 
optimum allocation for fixed n gains little over proportional allocation if all P, lie 
between 0.1 and 0.9. 


TABLE 5.5 
RELATIVE PRECISION OF STRATIFIED AND SIMPLE RANDOM SAMPLING 
Simple Stratified 
nV(p)ii —f) nV(p nl —f) Relative 
Pr = PQ =4)> PrQr Precision (%) 
0.4, 0.5, 0.6 2500 2433 103 
0.3, 0.5, 0.7 2500 2233 112 
0.2, 0.5, 0.8 2500 1900 132 
0.1, 0.5, 0.9 2500 1433 174 


To illustrate the first result, Table 5.5 compares stratified random sampling 
(proportional allocation) with simple random sampling for three strata of equal 
sizes (W; = 3). Four cases.are included, the first having P;, = 0.4, 0.5, and 0.6 in the 
threestrata and the last (and most extreme) having P, = 0.1,.0.5, and.0.9. The next 
two columns show the variances of the estimated proportion, multiplied by 
n/(1—f), and the last gives the relative precisions of stratified to simple random 
sampling. The gain in precision is large only in the last two cases. 

To compare proportional with optimum allocation for fixed n, it will be found 
that apart from the multiplier (1—f), 


= È WiVP.O,)” V P,Q)? V = E WiPiQh 
n 


Vopr ow > prop ~ 


(5.62) 


n 
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The relative precision of proportional to optimum allocation is therefore 


Voge _ Œ W, P.Qs)? 
Vorop E WaPrQn 


If all P, lie between the two values Po and (1-—P,), we are interested in the 
smallest value the relative precision will take. For simplicity, we consider two 
strata of equal size (W, = W2). The minimum relative precision is attained when 
P,=4 and P= Po. The relative precision then becomes 


Voor _ (0.5+ VPoQ0)? 


opt _ 


Vprop 2(0.25+ PoQo) 


Some values of this function are given in table 5.6. Even with Pp equal to 0.1, or as 
high as 0.9, the relative precision is 94%. In most cases the simplicity and the 
self-weighting feature of proportional stratification more than compensate for this 
slight loss in precision. 

The limitations of the example should be noted. It does not take account of 
differential costs of sampling in different strata, In some surveys the P, are very 
small, but they range from, say, 0.001 to 0.05 in different strata. Here there would 
be a more substantial gain from optimum stratification. 


(5.63) 


(5.64) 


TABLE 5.6 


RELATIVE PRECISION OF PROPORTIONAL TO OPTIMUM ALLOCATION 


Po 0.40r0.6 0.30r0.7 0.20r0.8 0.1 0r0.9 0.05 or 0.95 
RP(%) 


100.0 99:8 98.8 94.1 86.6 


5.12 ESTIMATION OF SAMPLE SIZE WITH 
PROPORTIONS 


Formulas can be deduced from the more general formulas in section 5.9. Let V 
be the desired variance in the estimate of the proportion P for the whole 


population. The formulas for the two principal types of allocation are as follows: 
Proportional: 


ry = Pa z"o (5.65) 
Ea 
Presumed optimum: 
Von 
nya esi EU No (5.66) 


ll 
1+—SY W, 
Ny > Warde 


& 
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where no is the first approximation, which ignores the fpc, and n is the corrected 
value taking account of the fpc. In the development of these formulas, the factors 
N,/(N,— 1) have been taken as unity. 

These results apply to the estimate of a proportion. If it is preferable to think in 
terms of percentages, the same formulas apply if Pha qa V, and so forth, are 
expressed as percentages. For the estimation of the total number in the population 
in class C, that is, of NP, all variances are multiplied by N°. 


EXERCISES 


5.1 Ina population with N =6 and L =2 the values of y,, are 0, 1, 2 in stratum 1 and 4, 
6, 11 in stratum 2. A sample with n = 4 is to be taken. (a) Show that the optimum n, under 
Neyman allocation, when rounded to integers, are n, = 1 in stratum 1 and n, = 3 in stratum 
2. (b) Compute the estimate ¥,, for every possible sample that can be drawn under 
optimum allocation and under proportional allocation. Verify that the estimates are 
unbiased. Hence find Vp (Fy) and Virop (Fz) directly. (c) Verify that V,,,(¥,,) agrees with the 
formula given in equation (5.6) and that Vpop (Js) agrees with the formula given in equation 
(5.8), page 93. (d) Use of formula (5.27), page 99, to compute Vopt(Ysr) is Slightly incorrect 
because it does not allow for the fact that the n, were rounded to integers. How well does it 
agree with the corrected value? 

5.2 The households in a town are to be sampled in order to estimate the average 
amount of assets per household that are readily convertible into cash. The households are 
stratified into a high-rent and a low-rent stratum. A house in the high-rent stratum is 
thought to'have about nine times as much assets as one in the low-rent stratum, and S, is 
expected to be proportional to the square root of the stratum mean. 

There are 4000 households in the high-rent stratum and 20,000 in the low-rent stratum. 
(a) How would you distribute a sample of 1000 households between the two strata? (b) If 
the object is to estimate the difference between assets per household in the two strata, how 
should the sample be distributed? 

5.3 The following data show the stratification of all the farms in a county by farm size 
and the average acres of corn (maize) per farm in each stratum. For a sample of 100 farms, 
compute the same sizes in each stratum under (a) proportional allocation, (b) optimum 
allocation. Compare the precisions of these methods with that of simple random sampling. 


Number of Average Standard 

Farm Size Farms Corn Acres Deviation 
(acres) Ny Yn Sh 
0-40 394 5.4 8.3 
41-80 461 16,3 13.3 
81-120 391 * 24.3 15.1 
121-160 334 34.5 19.8 
161-200 169 42.1 24.5 
201-240 113 50.1 26.0 
241- 148 63.8 35.2 


Total or mean 2010 26.3 
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5.4 Prove the result stated in formula (5.38), section 5.6:- 
a-f) = sp 1 | 
Vian = Vor N-D INC, — Bis (N-N,)S,7 


5.5 A sampler has two strata with relative sizes W,, W}. He believes that S,, S2 can be 
taken as equal but thinks that c, may be between 2c, and 4c,. He would prefer to use 
proportional allocation but does not wish to incur a substantial increase in variance 
compared with optimum allocation. For a given cost C = c,n, + czna, ignoring the fpc, show 


that 
VEO) =z Wiert Wc 
Voglu): (Wive, + Wave)? 
If W, = W, compute the relative increases in variance from using proportional alloca- 
tion when c,/c, =2, 4. 


5.6 A sampler proposes to take a stratified random sample. He expects that his field 


costs will be of the form È c,,. His advance estimates of relevant quantities for the two 
strata are as follows. 


Stratum W, Sh Cp 
1 0.4 10 $4 
2 0.6 20 $9 


(a) Find the values of n/n and n2/n that minimize the total field cost for a given value of 
V(j,,)- (b) Find the sample size required, under this optimum allocation, to make 
V(¥.,) = 1. Ignore the fpc. (c) How much will the total field cost be? 

5.7 After the sample in exercise 5.6 is taken, the sampler finds that his field costs were 
actually $2 per unit in stratum 1.and $12 in stratum 2. (a) How much greater is the field cost 
than anticipated? (b) If he had known the correct field costs in advance, could he have 
attained V(¥,,)=1 for the original estimated field cost in exercise 5.6? (Hint. The 
Cauchy-Schwarz inequality, page 97, with V’ = 1, gives the answer to this question without 
finding the new allocation.) 


5.8 Ina stratification with two strata, the values of the W, and S, are as follows. 


Stratum W, Si 
1 0.8, 2 
2 0.2 4 


Compute the sample sizes n,, nz in the two strata needed to satisfy the following conditions. 
Each case requires a separate computation. (Ignore the fpc.) (a) The standard error of the 
estimated population mean ĵ, is to be 0.1 and the total sample size n = n, +n is to be 
minimized. (b) The standard error of the estimated mean of each stratum is to be 0.1. (c) 
The standard error of the difference between the two estimated stratum means is to be 6.1, 
again minimizing the total size of sample. i i 
G 5.9. With two Strata, a sampler would like to have 7, = nz for administrative conveni- 
nce, instead of using the values given by the Neyman allocation. If V(j.,), Vop:(¥a.) denote 
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the variances given by the n, = n, and the Neyman allocations, respectively, show that the 
fractional increase in variance 
Va) = Vola) _ E ap 
Vopr (sr) r+ 
where r = n,/n, as given by Neyman allocation. For the strata in exercise 5.8, case a, what 
would the fractional increase in variance be by using n, =n, instead of the optimum? 


5.10 If the cost function is of the form C=co +}, t, Vim where co and the t, are known 
numbers, show that in order to minimize V(¥,,) for fixed total cost n, must be proportional 


to 
(m so 
WFE 
Find the n, for a sample of size 1000 under the following conditions. 


Stratum Ww, Sa br 
1 0.4 4 1 
2 0.3 5 2 
3 0.2 6 4 


5.11 If Viren Fu) is the variance of the estimated mean from a stratified random sample 


of size n with proportional allocation and V(j) is the variance of the mean of a simple : 
random sample of size n, show that the ratio 


Vowop (Yuu) 
vis) 
does not depend on the size of sample but that the ratio 
Vmin (Yat) 
Vpop (Yer) 


decreases as n increases. (This implies that optimum allocation for fixed n becomes more 
sony in relation to proportional allocation as n increases.) [Use formulas (5.8 and 
5.27). 

5.12 Compare the values obtained for V(p,,) under proportional allocation and 
optimum allocation for fixed sample size in the following two populations. Each stratum 1s 
of equal size, The fpc may be ignored. 


Population 1 Population 2 
eee 
Stratum Py Stratum P,, j 
nie 
1 0.1 1 0.01 
2 0.5 2 0.05 
3 0.9 3. 0.10 


What general result is illustrated by these two populations? 
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5.13 Show that in the estimation of proportions the results corresponding to theorem 
5.8 are as follows. 


a-f) 


SV rant Veron + L W, (P, — P} 


4e W, (Y P,Q =Y P,Q,)° 
3 n 
where 


VP,Q, =£ W, y P,Q- 


5.14 Ina firm, 62% of the employees are skilled or unskilled males, 31% are clerical ` 
females, and 7% are supervisory. From a sample of 400 employees the firm wishes to 
estimate the proportion that uses certain recreational facilities. Rough guesses are that the 
facilities are used by 40 to 50% of the males, 20 to 30% of the females, and 5 to 10% of the 
supervisors. (a) How would you allocate the sample among the three groups? (b) If the true 
proportions of users were 48, 21, and 4%, respectively, what would the s.e. of the estimated - 
proportion p,, be? (c) What would the s.e. of p be from a simple random sample with 
n=400? 


5.15 Formula (5.27) for the minimum variance of f, under Neyman allocation reads as 
follows. 


wh È WS)? E WaSh 
MF n N 


A student comments: “Since E W,S,7>(L W,S,)? unless all the S, are equal, the formula 


must be wrong because as n approaches N it will give a negative value for V(¥,.).” Is the 
formula or the student wrong? 


5.16 By formula (5.26) for Neyman allocation, the sampling fraction in stratum h is 
f= n/N, =nS,/N £ W;S,. The situations in which this formula calls for more than 100% 
sampling in a stratum (f, > 1) are therefore likely to be those in which the overall sampling 
fraction n/N is fairly substantial and one stratum has unusually high variability. The 
following is an.example for a small population, with N = 100, n = 40. 


V min (Far 


Optimum 
Stratum N, Si n, 
1 60 2 15 
2 30 4 15 
3 10 15 10 
100 des AO; 


(a) Verify that the optimum n, are as shown in the right column, (b) Calculate V(¥,,) by 
formula (5.6) and by formula (5.42) and show that both give V(¥,,)=0.12. 


CHAPTER 5A. 


Further Aspects 
of Stratified Sampling 


5A.1 EFFECTS OF DEVIATIONS FROM THE OPTIMUM 
ALLOCATION 


This chapter discusses a number of special topics in the practical use of stratified 
sampling. Sections 5A.1 to 5A.8, 5A.10, and SA.15 deal with problems that may 
come up in the planning of thie sample; the remaining sections deal with techni- 
ques of analysis of results. The present section considers the loss in precision by 
failure to achieve an optimum allocation of the sample. 

Suppose that it is intended to use optimum allocation for given n. The sample 
size n,' in stratum h should be 


,_ 2(WhSh) 
Nh Wis; WiSh (5A.1) 
From equation (5.27), page 99, the resulting minimum variance is 
FON) 1 
Vmin (Js) = ED Wasa) -RE Wi Si . (5A.2) 


In practice, since the S, are not known, we can only approximate this allocation. 


If fi, is the sample size used in stratum h, the variance actually attained, from 
equation (5.6), page 92, is 


Wa Sh 
Z knn 
nh 


K 1 
Va) =L -N E WypSh (SA.3) 
The increase.in variance caused by the imperfect allocation is 


Vu) — Vana) SH Hy WS)? (5A.4) 
Ân - 


In the first term on the right substitute for A in terms of n,’ from (5A.1). This 
115 
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gives the interesting result 


= = È W, Sn)? nN, 2 

YOu) Van Fa) = ASHE (5 "A n) 

= (E WiSi)? X (Ân =n) 
n? Ân 


Reverting to equation (5A.2), if the fpc (last termon the right) is negligible, we see 
that 


(5A.5) 


Haia, (63: Wis" (5A.6) 


Hence the proportional increase in variance resulting from deviations from the 
optimum allocation is 


VOe) Vmin Fa) _ 1 & (iy = ry)? 
aa (5A.7) 


Vmin (se) 


where ñ, is the actual and n,’ the optimum sample size in stratum A. If the fpc is 
not negligible, the = sign in (5A.7) becomes =. ; 

Let ga = [ñn —n'|/Ân be the absolute difference in the sample sizes in stratum h, 
expressed as a fraction of the actual sample size ny. Then (5A.7) becomes 


1 ny 


VEViny E Ay a a 
Vin hain Bh (SA. 


a Weighted mean of the g,”. A conservative upper limit to (V— Vmin)/ Vmin iS 
therefore g*, where gis the largest proportional difference in any stratum. Thus, if 
& = 0.2 or 20%, the proportional increase in variance cannot exceed (0.2)? or 4%. 


If g = 30% the proportional increase in variance is at most 9%. In this sense the 
optimum can be described as flat. 


TABLE 5A.1 
EFFECTS OF DEVIATIONS FROM OPTIMUM ALLOCATION 
my Ay lin — n| (r — ny’? 

Stratum (opt) (act) Ay th 

1 200 150 0,33 16.7 

2 100 120 0.17 3.3 

3 40 70 0,43 12.9 
E EN A a E 
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Furthermore, (5A.8) suggests that the upper limit g° will often overestimate the 
proportional increase in variance by a substantial amount. Table 5A.1 gives an 
example with three strata for n = 340. Optimum allocation requires sample sizes 
of 200, 100, and 40, whereas the sizes actually used are 150, 120, and 70. 

Since the value of g is 0.43 (stratum 3), the rough rule gives 18% as the 
proportional increase in variance. From the column on the right, the actual 
increase is seen to be 32.9/340=9.7%. 

Evans (1951) examined the same question in terms of the effects of errors in the 
estimated S, and developed an approximate rule showing whether an estimated 
optimum is likely to be more precise than proportional allocation. He supposes 
that the coefficient of variation of the estimated S, is the same in all strata. This 
assumption is appropriate when the S, have been estimated from a preliminary 
sample of the same size in each stratum. He shows how to compute the size of a 
preliminary sample needed to make an “optimum” allocation better, on the 
average, than proportional allocation. Previously, Sukhatme (1953) showed thata 
small initial sample usually gives a high probability that “optimum” allocation will 

_ be superior to proportional stratification. See Sukhatme and Sukhatme (1970), p. 
88. 


5A.2 EFFECTS OF ERRORS IN THE STRATUM SIZES 


For a desirable type of stratification, the stratum totals N, may not be known 
exactly, being derived from census data that are out of date. Instead of the true 
stratum proportions W,, we have estimates w,. The sample estimate of Y is 
LWayn- 

In general terms, the consequences of using weights that are in error are as 
follows. 


1. The sample estimate is biased. Because of the bias, we measure the accuracy 
of the estimate by its mean square error about Y rather than by its variance about 
its own mean (see section 1.9). 

2. The bias remains constant as the sample size increases. Consequently, a size 
of sample is always reached for which the estimate is less accurate than simple 
random sampling, and all the gain in precision from stratification is lost. 

Eh The usual estimate s(J,,) underestimates the true error of Jsp since it does not 
contain the contribution of the bias to the error. 


To justify these statements, note that in repeated sampling the mean value of 
the estimate is £, w, Y;,. The bias therefore amounts to 
E (wa — Wi) Yn 


Itis independent of the size of the sample. In finding the mean square error (MSE) 
of the estimate, it is easy to verify that the variance term is given by the usual 
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formula, with w, in place of W,. Hence 
2¢q2 ha 
MSEG =I (1— f,)+[E on- Wa) Yn (5A.9) 
h ™ 


This expression was given by Stephan (1941). Finally, the usual formula for 


5°(¥z:) is clearly an unbiased estimate of the first term in (5A.9) but takes no 
account of the second term. 


Example. This illustrates the loss of precision from incorrect weights when stratifica- 
tion is (a) slightly effective, (b) highly effective. Consider a large Population yith ae 1, 
divisible into two strata with W,=0.9, W,=0.1. We will assume $,=S,=S,. Then, 
neglecting terms in 1/N,, e 

S?=¥ W, S +E W, (Y, — Y? (5A.10) 
=S + W, WY,- Y2)? 
that is, ` 
Tanaim 1=S,?+0.09(Y,— Y,)? 

in (a) take Y,— Y.=1. Then S,7=0.91, and proportional stratification with correct 
weights reduces the variance by 9%, compared with simple random sampling. 

In (b) take Y,- Y, =3, giving S,?=0.19, a reduction in variance of more than 80%. 

With two strata and incorrect weights, the bias may be written 

(w= wW)(Y, i Y.) 


since (w,— W,) = -(w,- W,). Suppose that the estimated weights are w,=0.92 and 
wa = 0.08. The bias amounts to (0.02)(1) =0.02 in (a) and to 0.06 in (b). Hence we have the 
following comparable MSE's for a sample of size n. 


Simple random sampling: V(¥)= l 
n 


Stratified random sampling: 


(a): MSE(¥,,) s2, 0.0004 


.19 
(b): MSE (F, =% 0.0036 


As Table 5A.2 shows, simple random sam 
There is little to choose bet 


pling begins to win relative to (a) at n = 300. 
In (6), with more at stak 


, however, up to n = 1000. 


200, although most of the 
- Beyond n = 300 stratification 


In some surveys a large preliminar: 
estimate the W,. This technique, kno 
ing, has numerous applications and i 


y sample of size n’ can be taken in order to 
Wn as double sampling or two-phase sampl- 
S discussed in Chapter 12. It will be shown 
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TABLE 5A.2 
COMPARABLE VALUES OF MSE(j¥) 
Stratified Random 


Simple 
n Random (a) (6) 
50 0.0200 0.0186 0.0074 
100 0.0100. 0.0095 0.0055 
200 0.0050 0.0049 0.0045 
300 0.0033 0.0034 0.0042 
400 0.0025 0.0027 0.0041 


0.0010 


that with double sampling the mean square error of J, is approximately 


2 v 2 
h WiSi +e We) (5A.11) 
n n 

By comparing this MSE with S*/n, as given by equation (5A.10), we see that most 
of the gain from stratification is retained provided that n’ is much greater than n. 
To put it more generally, a set of estimated weights preserves most of the potential 
gain from stratification if the weights are much more accurately estimated than 
they would be from a simple random sample of size n. 


5A.3 THE PROBLEM OF ALLOCATION WITH MORE 
THAN ONE ITEM 


Since the best allocation for one item will not in general be best for another, 
some compromise must be reached in a survey with numerous items. The first step 
is to reduce the items considered in the allocation to a relatively small number 
thought to be most important. ?f good previous data are available, we can then 
compute the optimum allocation for each item separately and see to what extent 
there is disagreement. In a survey of a specialized type the correlations among the 
items may be high and the allocations may differ relatively little. 


Example. Data given by Jessen (1942) illustrate a farm survey of this kind. The state of 
Iowa was divided into five geographic regions, each denoted by its major agricultural 
enterprise. Suppose that these regions are to be used as strata ina survey on dairy farming. 
The three items of most interest are the number of cows milked per day, the number of 
gallons of milk per day, and the total annual cash receipts from dairy products. From a 
survey made in 1938, the estimated standard deviations s, within strata are shown in table 
5A.3. In Table 5A.4 the optimum Neyman allocations based on these Sa are given for the 
individual items in a sample of 1000 farms. 


120 SAMPLING TECHNIQUES 


TABLE 5A.3 
STANDARD DEVIATIONS WITHIN STRATA 


Sh 
Receipts 
Sh Sh for Dairy 
Ny Cows Gallons Products 
Stratum We=N  Milked of Milk (8) 
Northeast dairy 0.197 4.6 11.7 332 
Cash grain 0.191 3.4 9.8 357 
Western livestock 0.219 3.3 7.0 246 
Southern pasture 0.184 2.8 6.5 173 
Eastern livestock 0.208 3.7 9.8 279 
TABLE 5A.4 
SAMPLE SIZES WITHIN STRATA (n = 1000) 
Allocation 
Optimum for 
SSS 
Stratum Proportional Cows Gallons Receipts My 
Northeast dairy 197 254 258 236 250 
Cash grain 191 182 209 246 212 
Western livestock 219 203 171 194 189 
Southern pasture 184 145 134 115 131 
Eastern livestock 208 216 228 209 218 
TABLE 5A.5 
EXPECTED VARIANCES OF THE ESTIMATED MEAN 
Type of allocation Cows Gailons Receipts 
Sa EE PE 
Optimum 0.0127 0.0800 76.9 
Compromise 0.0128 0.0802 77.6 
Proportional 0.0131 0.0837 80.9 


ee Te | 


The individual optimum allocations differ only moderately from each other. With one 
exception, all three deviate in the same direction from a proportional allocation. Thus, in 
the first stratum, proportional allocation suggests 197 farms, and the individual allocations 
lead to numbers between 236 and 258. The average of the optimum sample sizes for the 


three items, shown in the right-hand column, provides a satisfactory compromise alloca- 
tion. 
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Table 5A.5 shows the expected sampling variances of y,,, as given by the individual 
optima, the compromise, and the proportional allocations. The formulas are as follows. 


_ E Ws)? o (Wasa)? E Wsw 
opt aro: > Vcomp =} a A as ae 


The compromise allocation gives results almost as precise as if it were possible to use 
separate optimum allocations for each item. What is more noteworthy is that proportional 
allocation is only slightly less precise than the compromise or the individual optima. 
Furthermore, Table 5A.5 overestimates the precision of the optima and of the compro- 
mise, since these allocations were made from estimated variances. This result is another 
illustration of the flatness of the optimum mentioned in section 5A.1. 


5A.4 OTHER METHODS OF ALLOCATION WITH 
MORE THAN ONE ITEM 


An alternative compromise allocation suggested by Chatterjee (1967) is to 
choose the n, that minimize the average of the proportional increases in variance 
from (5A.7), taken over the variables. If j denotes a variable this amounts to 


choosing 
Nh =n mn’? /Z Emm" (5A.12) 
i Í 


where njn is the optimum sample size in stratum h for variable j. For the data in 
Table 5A.4, where the individual optima differ only slightly, Chatterjee’s ną, vary 
from the average m, in Table 5A.4 by, at most, one unit in any stratum. 

In some surveys the optimum allocations for individual variates differ so much 
that there is no obvious compromise. Some principle is needed to determine the 
allocation to be used. Two useful ones suggested by Yates (1960) are presented. 

The first applies to surveys with a specialized objective, in which the loss due to 
an error of given size in an estimate can be measured in terms of money or utility, 
as discussed in section 4.10. With k variates and quadratic loss functions, it may be 
reasonable to express the total expected loss as a linear function of the variances of 
the estimated population means or totals. For the means, 


k di guictketaea 1 1 
B= 2 aV) = z aj 2 mesi 57) (5A.13) 


where $ ih is the variance of the jth variate in stratum h. Interchange of the order of 
summation gives 


L=7 xiy a3) -iF wal ash) (5A.14) 


With a linear function for the costs of sampling, we have 
C= cot} Chh (5A.15) 


A 
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Minimizing the product of (C— co) and the first term in L (the term depending on 
the n) gives, by the Cauchy-Schwarz inequality, 


m ENS ajSin (SA.16) 
CRS TL 


The constant of proportionality is found by satisfying the constraint given for Lor 
C. For instance, suppose that the value of L is specified and that the fpc term may 
be ignored. We have 


ry AWAN (64.17) 
E (WhAn/ Vcr) 
where A, = V$ a;S},: The required total sample size is, from (5A.14), 
i 
1 ( N ; 
n=+(5 W,A Va) (5A.18) 
ip m Ven z hh h 


In the second approach we specify the desired variance V; for each variate. For 
-population means this implies that 


Zod? ep 2 
P WiSe VSE (JEN Yk) (5.19) 
h=1 Mh Asap 

Inequality signs are used because the most economical allocation may supply 
variances smaller than the desired V; for some items. 5 

In this approach the cost C [equation (5A.15)] is minimized subject to the 
tolerances V; and the conditions 0 = n, = N,. The problem is one in nonlinear 
programming. Algorithms for its solution have been given by Hartley and 

Hocking (1963), Chatterjee (1966), Zukhovitsky and Avdeyeva (1966), and 
Huddleston ef al. (1970). Earlier, Dalenius (1957) gave an ingenious graphical 
solution, while Yates (1960) and Kokan (1963) developed methods of successive 
approximation, illustrated in the second edition of this book. 

A useful first step is, of course, to work out the optimum allocation for each 
variate separately and find the cost of satisfying its tolerance. Take the variate, say 
Yı; for which the cost C; is highest and examine whether the optimum n, values 
for y; satisfy all the other (k — 1) tolerances. If so, we use this allocation and the 
problem is solved, because no other allocation will satisfy the tolerance V; for yı 
at a cost as low as C}. 

By working a series of examples in a related problem, Booth and Sedransk 
(1969) have pointed out that in default of a computer program a good approxima- 
tion to the solution of Yates’ second problem can often be obtained by solving the 
easier first problem. Specify that L in (5A.13) shall have the value V? = E aV; 
where the V; are the desired individual tolerances and the a; are made inversely 
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proportional to the V; Thus with two variates, a,=V2/(Vi+V>2), a= 
V,/(Vi+ V2), and 
__2V,V2 
ÆSA A (5A.20) 


Example. (Four strata, two variates.) The data and the application of the approximate 
method are shown in columns (1) to (6) of Table 5A.6. The problem is to find the smallest n 
for which s 


VF) £0.04, V(Fzu) $0.01 


TABLE 5A.6 
ARTIFICIAL DATA FOR FOUR STRATA, TWO VARIATES 
Column (1) (2) (3) (4) (5) (6) (7) 
Stratum Wp Sik s3 Ar =E ash W, An Ny hy, 
7 
1 0.4 25 1 5.8 0.963 206 194 
2 0.3 25 4 8.2 0.859 183 180 
3 0.2 251 46 17.8 0.844 180 187 
4 0.1 25 64 56.2 0.750 160 171 


Totals 3.416 729 - 732 


By working out the optimum allocation for each variate separately it is easily verified that 
n = 625 is needed to satisfy the first constraint and n = 676 is needed to satisfy the second. ` 
However, n = 676 with its allocation does not satisfy the first constraint giving 0.0589 
instead of 0.04 for V,. An iterative solution to satisfy both constraints (presented in the 
second edition) gave ñ = 732, with the ñ, shown in column (7) of Table 5A.6. . 

To use the Booth and Sedransk approach with V, = 0.04, V.=0.01, we specify 


ee g = 2(0.04)(0.01) 
L =0.2V(Fi a) +0.8 V(F2,,) = = 0.016 
Fist) (Fos) (0.05) 0.0 
From (SA.18), with c, = 1, we have 
(z La ix) (3.416)? 
ee e729 
L 0.016 


using column (5) of Table 5A.6. From (5A,17), column (5) also leads to the n, values, 
shown in column (6) of Table 5A.6. As columns (6) and (7) show, the two solutions n, and 
ñ, agree well. 

As Booth and Sedransk note, n <7 in all problems of this type, since n satisfies the single 
constraint L = V*, but it need not satisfy the constraint on every variate, whereas the ñ 
allocation satisfies L as well as the individual constraints. 
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5A.5 TWO-WAY STRATIFICATION WITH SMALL SAMPLES 


Suppose that there are two criteria of stratification, say by R rows and C 
columns, making RC cells. If n = RC, every cell can be represented in the sample. 
A problem arises when n < RC, and we would like the sample to give proportional 
representation to each criterion of stratification. In a simple method developed by 
Bryant, Hartley, and Jessen (1960) the technique requires only that n exceed the 
greater of R and C. 

To illustrate this method, suppose that a small population of 165 schools has 
been stratified by size of city into five classes and by average expenditure per pupil 
into four classes. The numbers of schools m,; and the proportions of schools 
P; = m;/165 in each of the 20 cells are shown in Table 5A.7. 


TABLE 5A.7 
NUMBER AND PROPORTION OF SCHOOLS IN EACH CELL 

Sie Expenditure per Pupil 
City A B G D i Totals ny, 
i mie 15 21 17 9 m, 62 

Py; 0.091 0.127 0.103 0.055 | Py. 0.376 4 
1 Mo; 10 8 13 7 me, 38 

Py; 0.061 0.049 0.079 0.042 | Po. 0.231 2 
u M; 6 9 Shas m, 28 

Ps; 0.036 0.055 0.030 0.049 | P, 0.170 2 
K ma; 4 3 6 6 ma, 19 

Py; 0.024 0.018 0.036 0.036 | Py, 0.114 1 
? menaa P5 8 | m, 18 

Ps; 0.018 0.012 0.030 0.049 | P;, 0.109 1 

Totals m; 38 43 46 ° 38 165 
P; 0.230 0.261 0.278 0.231 1.000 
nj 2 3 3 2 


The objective is to give each school an approximately equal chance of selection 
while giving each marginal class its proportional representation. In this illustration 
n=10. Compute the numbers n;, = nP, and n;=nP,, where these products are 
rounded to the nearest integers (with a further minor adjustment, if needed, so 
that the n; and the n; both add to n). These numbers are shown in Table 5A.7. 
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The next step is to draw n = 10 cells with probability n,n ;/h? for the ijth cell. 
This is done by constructing an n X n square (Table 5A.8). In row 1 one column is 
drawn at random. In row 2 one of the remaining columns is drawn at random, and 
so on. At the end, each row and column contains one unit. (This draw is most 
quickly made by a random permutation of the numbers 1 to 10.) The results of one 
draw are indicated by X’s in Table 5A.8. 


TABLE 5A.8 
10 x 10 SQUARE FoR DRAWING THE SAMPLE 
Column 
Er 92) 3 4 °5 607: K8 9 10 
Row A Bati G D 
1 x 
2 I x 
3 x 
4 x 
5 x 
6 1r x 
7 HI x 
8 x 
9 IV x 
10 Ne x 
DS 


Note that columns 1 and 2 are assigned to marginal stratum A, since n., =2. 
Similarly, rows 1 through 4 are assigned to marginal stratum I, since n;, = 4, and so 
on. This completes the allocation of the sample to the 20 cells. The allocation 
appears in more compact form in Table 5A.9. Two schools are drawn at random 
from the 15 schools in cell IA, and so on. The probability that a school in row i, 
column j is drawn is proportional to n,n ;/ Py. Thus the probabilities are not equal, 
although they will be approximately so if Py =n.n;/n°. 

An unbiased estimate of the mean per school is 


- 1on?P, 
ioe Lan” 


where yy is the sample total in the ijth cell. If, however, Py = mni n’, the sample 
mean ¥ is probably preferable, since its bias should be negligible. A sample 
estimate of variance is available for both the unbiased and biased estimates, 
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TABLE 5A.9 
ALLOCATION OF THE SAMPLE TO THE 20 CELLS 
AEB ED. Total 
I 2 let Law 0 4 
il On 07>) 50 2 
Ill CERO, 2 
IV o 1°00 1 
v DE AOS N 1 
Total PA 3S a Bc 2h 10 


provided that n is at least twice the greater of R and C and that at least two units 
are drawn in every row and column. 

If P, differs markedly from m;n ;/n?° in some cells, an extra step keeps the 
probabilities of selection of schools more nearly constant. After computing the 7;, 
and 7z., examine the quantities D; =nP,;—n;n_j/n, after rounding them to inte- 
gers. If, in any cell, D; is a positive integer, automatically assign D; units to this 
cell. Reduce n, the n;, and the n ; by the amounts required by this fixed allocation 
and carry out the remaining allocation as before. 


5A.6 CONTROLLED SELECTION 


Another technique for this problem with small samples was named controlled 
selection by Goodman and Kish (1950). A simple illustration given by Hess, 
Riedel, and Fitzpatrick (1976), who applied the method for sampling hospitals, 
shows the basic idea. The principal stratification is by size of hospital (two strata). 
Representation of each of two types of ownership of the hospitals is also desirable, 
but only one unit (hospital) is to be drawn from each principal stratum. In 


TABLE 5A.10 
ORDERING OF UNITS WITHIN STRATA FOR CONTROLLED SELECTION 
Original Order Revised Order 
Stratum Stratum 
ee ATS SG ty You de 
I-Large Hospital II-Small Hospital Large Hospital Small Hospital 
=e O O 
1 1 1 3' 
2 2 2 4 
3 3! 3! 5’ 
4 4' 4 ines 
5 2 


—————  —_ 
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numbering the units within strata, a prime (') indicates one ownership type, and 
absence of a prime indicates the other type. Table 5A.10 (left side) shows the units 
in the two strata. 

If unit 1 or 2 is drawn from stratum I, we would like to draw units 3’, 4’, or 5’ 
from stratum II, so that both types of stratification are present with n =2. 
Similarly, 3’ or 4’ (stratum I) is desired with 1 or 2 (stratum II). Controlled 
selection makes the probability of these desired combinations as high as is 
mathematically possible, while retaining equal probability selection within strata 
and therefore unbiased estimates by the usual formulas for stratified sampling. 
The purpose is either increased accuracy for given n or a saving in field costs. 

a I random sampling the probability of a desired combination is 
(.5)(.6) +(.5)(.4) =.5. This probability can be increased to .9 by two simple 
ANE in See selection. Rearrange the units in stratum II so that the desired 
combinations (3’, 4’, 5’) with 1 and 2 instratum I come first, as on the right in Table 
5A.10. Then draw a random number r between 1 and 100 and use it to select the 
units from both strata. In stratum I, 1=r=25 selects unit 1, 26 =r = 50 selects 
unit 2, and so on, so as to give each unit the desired one fourth probability of being 
chosen. Similarly, in stratum II, 1=r=20 selects 3’, 21=r=40 selects 4’, and so 
on. Hence, if 1 =r <20, we select (1, 3’), if 20=r <25, we select (1, 4’) and so on. 
The joint selections and their probabilities are as follows. 


Pair: (1,3) (1,4). (2,4) (2,5) (35) (351) (1) (2) 
Probability, "20 "= 10575 2 15iie 5210) 510) 9.15) 0S sue20 


The only nondesired combination is (3’, 5‘). Thus the total probability of the 
desired combinations is .90. 

Since sampling is not independent in the two strata, the formulas for V(j,.) and 
v(fs) do not apply. Hess, Riedel, and Fitzpatrick (1976), give approximate 
formulas. This monograph also gives an algorithm for the application of control- 
led selection in problems with more strata, larger n, and more complex controls. 

For another approach using balanced incomplete block designs, see Avadhani 
and Sukhatme (1973). 


5A.7 THE CONSTRUCTION OF STRATA 


This topic raises several questions. What is the best characteristic for the 
construction of strata? How should the boundaries between the strata be deter- 
mined? How many strata should there be? For a single item or variable y the best 
characteristic is clearly the frequency distribution of y itself. The next best is 
presumably the frequency distribution of some other quantity highly correlated 
with y. Given the number of strata, the equations for determining the best stratum 
boundaries under proportional and Neyman allocation have been worked out by 
Dalenius (1957), and quicker approximate methods by several workers. We will 
consider Neyman allocation, since it is usually superior to proportional allocation 
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in populations in which gains from stratification are greatest. It is assumed at first 
that the strata are set up by using the value of y itself. 


Let yo, yz be the smallest and largest values of y in the population. The problem 
is to find intermediate stratum boundaries Yis Yz *** , Yp—1 Such that 


L 2 T 
vou ==( 5 WSs) -1F ws? (5A.21) 
n\p=1 N ra 


is a minimum. If the fpc is ignored, it is sufficient to minimize X W,S,. Since y, 
appears in this sum only in the terms WpS, and Wp+1Sh+1, we have 


a a mau 
ay, (= W, Sh) magi WiSh)+ ayn (Wi+1Sh+1) 
Now if f(y) is the frequency function of y, 
Yn ow, 
= t) dt, —= 5A.22 
waa] fod, Mey) (54.22) 
Further, 
Yh s 2 
Yn [f f(t) ar) 
WpS = Í rfi) dt- (5A.23) 
bes ar 
Ya=1 
Differentiation of (5A.23) gives 


ôW, ðS, : : 
SH, FAM SnS = vif Cn) 2yrtenflyn) + nef) (5A.24) 


where up is the mean of y in stratum h, Add S,? aW,/dy,, to the left side, and the 
equal quantity $,7f(y,) to the right side. This gives, on dividing by 25,,, 


WSs) o 2Wa 3S 


PATM 


Yh- Mn)? +S, 
Ry 


1 
W,— == 5A.25 
ay; ay, Kaya 3/0) (5A.25) 
Similarly we find } 
ĝl =, ee 
MMS) yy, On brea) + Sha Ahen + Sh (5A,26) 


h+1 
Hence the calculus equations for y, are 
A 2 
Oh) ES sOn hna) HS 


S Sila (h=1,2,...,L71) (54.27) 


Unfortunately, these equations are ill 


adapted to practical computation, since 
both w, and $, depend on Yn A quick a p 


PProximate method, due to Dalenius and 
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Hodges (1959), is presented for minimizing Y WpS. Let 


Zy)= f VFO dt (5A.28) 


If the strata'are numerous and narrow, f(y) should be approximately constant 
(rectangular) within a given stratum. Hence, 


wi=[" Odh (5A.29) 

1 
Si = Ta” —Yn-1) (SA.30) 
Z-Z VAD de=- (54.31) 


Yh-ı 


where fa is the “constant” value of fly) in stratum A. By substituting these 
approximations, we find 


B. L L 
v12 2, WSS L faye Ye E (ZiZa (5.32). 


Since (Zr — Zp) is fixed, it is easy to verify that the sum on the right is minimized by 
making (Z;, —Z;,-1) constant. 

Given f(y), the rule is to form the cumulative of f(y) and choose the y; so that 
they create equal intervals on the cum Vf(y) scale, Table 5A.11 illustrates the use 
of the rule. 


TABLE 5A.11 
CALCULATION OF STRATUM BOUNDARIES BY THE CUM Vf) RULE 
Industrial Loans, Cum | Industrial Loans Cum 


Total Loans ^ /®) VTO) Total Loans % £Y VFO) 
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Example. The data show the frequency distribution of the percentage of bank loans 
devoted to industrial loans in a population of 13,435 banks of the United I 
(McEvoy, 1956). The distribution is skew. with its mode at the lower end. In the cum Vf 
column, 58.9 =¥3436, 109.1 =/3464+/2516, and so on. ri ae 

Suppose that we want five strata. Since the total of cum Vf is 389.5, the division points 


should be at 77.9, 155.8, 233.7, and 311.6 on this scale. The nearest available points are as 
follows: : 


Stratum 
1 Mi he. 3 4 5 
Boundaries 0-5% 5-15% 15-25% 25-45% 45-100% 
Interval on cum Vf 58.9 96.6 73.6 85.6 74.8 


The first two intervals, 58.9 and 96.6, are rather unequal, but cannot be improved on 
without a finer subdivision of the original classes. 

If the class intervals in the original distribution of y are of unequal length, a 
slight change is needed. When the interval changes from one of length d to one of 
aene the value of Vf for the second interval is multiplied by Vu.when forming 
cum Vf. 

SNA method, proposed by Sethi (1963), isto work out the boundaries given 
by the calculus equations (5A.27) for a standard continuous distribution resem- 
bling the study population. For the normal and various y? distributions, Sethi has 
tabulated the optimum boundaries for Neyman, equal, and proportional alloca- 
tion for L <6. If one of these distributions seems to approximate that in the study. 
Population, the boundaries can be read from Sethi’s tables. 

__Two further approximate methods require some trial and error. From relations 
(5A.32), the Dalenius-Hodges rule is roughly equivalent to making W,,S;, con- 
stant, as conjectured earlier by Dalenius and Gurney (1951). A similar rule is that 
of Ekman (1959), who makes W,,(Yn —yn—1) constant. 

. In comparisons on s 
(1961) found that the c 


method. 
The relations (5A.32) have an in 
Neyman allocation gives a constant 
approximate methods, the comparis 
simple rule n, = n/L is satisfactory, 
Thus far we have made the unrea 
based on the values of y itself. In prac 


teresting consequence. If WS, is constant, 
sample size n, = n/L in all strata. For the 
ons that have been made suggest that the 


listic assumption that stratification can be 
tice, some other variable x is used (perhaps 


x 
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the value of y at a recent census). Dalenius (1957) develops equations for the 
boundaries of x that minimize £ W,,S,,, given a knowledge of the regression of y 
on x. If this regression is nonlinear, these boundaries may differ considerably from 
those that are optimum when x itself is the variable to be measured. The equations 
indicate, however, that if the regression of y on x is linear and the correlation 
between y and x is high within all strata the two sets of boundaries should be 
nearly the same. Let 


y=a+Bxt+e 


where E(e)=0 for all x and e, x are uncorrelated. The variance of e within 
stratum h is Sen”. Then the x-boundaries that make V (Js) a minimum satisfy the 
equations (Dalenius, 1957). 


B llan = Man)? + Sen J+ 2S cn? 2 BAUGH = Han) + S?n] +282 401 


BSA: N14 Sh? /B Sin BSx.n41V 1+ SRB Sensi 


If S3,/B°S%, is small for all h, these equations reduce to the form (5A.27) that 
gives optimum boundaries for x. But S24/B°S3a = (1p, )/pr where pn is the 
correlation between’y and x within stratum A. 

Although more investigation is needed, this result suggests that the cum Vfrule 
applied to x should give an efficient stratification for another variable y that has a 
linear regression on x with high correlation. Some numerical results by Cochran 
(1961) support this conjecture. Moreover, if the pn are only moderate, as will 
happen when the number of strata is increased, failure to use the optimum 
x-boundaries should have a less deleterious effect on y. 

The preceding discussion is, of course, mainly relevant to the sampling of 
institutions stratified by some measure of size. The situation is different when one 
set of variables is closely related to y, and another set, with a markedly different 
frequency distribution, is closely related to y2. One possibility is to seek compro- 
mise stratum boundaries that meet the desired tolerances on V (Fis) and V(Fos,), 
following a general approach given in section 5A.4, but computational methods 
haye not been worked out. 

In geographical stratification the problem is less amenable to a mathematical 
approach, since there are so many different ways in which stratum boundaries may 
be formed. The usual procedure is to select a few variables that have high 
correlations with the principal items in the survey and to use a combination of 
judgment and trial and error to construct boundaries that are good for these 
selected variables. Since the gains in precision from stratification are likely to be 
modest, it is not worthwhile to expend a great deal of effort in improving 
boundaries. Bases of stratification for economic items have been discussed by 
Stephan (1941) and Hagood and Bernert (1945) and for farm items by King and 
McCarty (1941). 
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54.8 NUMBER OF STRATA 


The two questions relévant to a decision about the number of strata L are (a) at 
what rate does the variance of Ys: decrease as L is increased? (b) How is the cost of 
the survey affected by an increase in L? 

As regards (a); suppose first that strata are constructed by the values of y. To 
take the simplest case, let the distribution of y be rectangular in the interval 
(a, a +d). Then S,?, before stratification, is d*/12, so that with a simple random 
sample of size n, V(¥) = d?/12n. If L strata of equal size are created, the variance 
within any stratum is $2, = d?/12L?. Hence, for a Stratified sample, with W, = 
1/L and m, =n/L, 


TAMAL 2 ER ala J=- -w 54.33) 
VGa)=—( WSs) AOS JL nl? D? ( 
Thus with a rectangular distribution the variance of Js: decreases inversely as the 
square of the number of strata. Rather remarkably, this relation continues to hold, 
roughly, when actual skew distributions with finite Tange are Stratified with the 
optimum choice of boundaries for Neyman allocation. In eight distributions of 
data of the type likely to occur in. practice, Cochran (1961) found that the average 
values of V(¥,,)/ V(¥) were 0.232, 0.098, and 0.053 for L = 2, 3, 4, as compared 
with 0.250, 0.111, and 0.062 for the rectangular distribution. 

These results, which suggest that multiplicaton of strata is profitable, give a 
misleading picture of what happens when some other variable x-is used to 
construct the strata. If 6(x) = E(y|x) is the regression of y on x, we may write 


y=d(x)+e (5A.34) 
where ¢ and ¢ are uncorrelated: Hence 


S =S +S? (5A.35) 


By the preceding results, creation of L optimal strata for x ma 
Ss /L? if ġ(x)is linear or at a smaller i 

not reduced by stratification on x. i 
later at which the term S; dominates. 


ter t e ; in L will produce only a 
trivial proportional reduction in V (Fs). 


How quickly the pointof diminishin 
factors—particularly the relative sizes 


y reduce S4 to 


stratum boundaries by means of x 
reduces V(i,.) at a rate proportional to 1/L*. Thus 


eed 2 
View = =o Wi'St,= 5, (5A.36) 
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Suppose also that the regression of y on x is linear, that is, 
: y=a+Pxt+e (SA.37) 
where S. is constant. Then, _ 


JÉ Se L 
vou== S, mesa =E § È mesat E ¥ we GA.38) 


h=1 h=1 


iah ? 
For any set of L strata, $, Wir: Using (5A.36), we have 


202 2 2 
vonz (Ese) 2a] (5A.39) 
TA T 

where p is the correlation between y and x in the unstratified population. 

With this model, Table 5A:12 shows V(¥,,)/ V(¥) for p = 0.99, 0.95, 0.90, and 
0.85 anc L =2 to 6, assuming that relation (SA.39) is an equality, The right-hand 
columns of the table give V(j,,)/V(¥) for three sets of actual data, described 
under the table, in which x is the value'of y at some earlier time. 

The results for the regression model indicate that unless p exceeds 0.95, little - 
reduction in variance is to be expected byond L = 6. Data sets 2 and 3 support this 
conclusion, although some further increase in L might be profitable with the 


TABLE 5A.12 


V(G)/V@) AS A FUNCTION OF L FOR THE LINEAR REGRESSION MODEL AND FOR 
Some ACTUAL DATA 


Linear Regression Model Data, Set 


p= 


L 0.99 0.95 0.90 0.85 1 2 3 


Type of Data 
Set Data = y Source 


1 College enrollments 1952 1958 Cochran (1961) 
2 City sizes 1940 1950 Cochran (1961) 
3 Family incomes 1929 1933 Dalenius and Gurney (1951) 
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college enrollment data (set 1). In two comparisons on survey data, Hess, Sethi, 
and Balakrishnan (1966) found that V(j,,) decreased faster with L than (5A.39) 
predicts, which suggests that model (SA.37) is oversimplified. 

To complete this analysis, we require a cost function that shows how the cost 
depends on L. Dalenius (1957) suggests the relation C= LC, + nC,,. The cost ratio 
C,/C,, will vary with the type of survey. An increase.in the number of strata 
involves extra work in planning and drawing the sample and increases the number 
of weights used in computing the estimates, unless they are self-weighting. In 
some surveys almost no change is required in the organization of the field work; in 
others a separate field unit is set up in each stratum. Whatever the form of the cost 
function, the results in Table 5A.12 suggest that if an increase in L beyond 6 
necessitates any substantial decrease in n in order to keep the cost constant the 
increase will seldom be profitable. 

The discussion in this section is confined to surveys in which only over-all 
estimates are to be made. If estimates are wanted also for geographic subdivisions 
of the population, the argument for a larger number of strata is stronger. 


5A.9 STRATIFICATION AFTER SELECTION OF THE SAMPLE 
(POSTSTRATIFICATION) 


With some variables that are suitable for Stratification, the stratum to which a 
unit belongs is not known until the data have been collected. Personal characteris- 
tics such as age, sex, race, and educational level are common examples. The 
stratum sizes N, may be obtainable fairly accurately from official Statistics, but the 
units can be classified into the strata only after the sample data are known. We 
assume here that W,, N, are known. 

One procedure is to take a simple random sample of size n and classify the units. 
Instead of the sample mean y, we use the estimate py =) Wi, where yy is the 
mean of the sample units that fall in stratum h, and W, = N,/N. This. method is 
almost as precise as proportional stratified sampling, provided that (a) the sample 


is reasonably large, say >20, in every stratum, and (b) the effects of errors in the 
weights W, can be ignored (see section 5A.2). 


To show this, let m, be the number of units in the sam 


where m, will vary from sample to sample. For samples 
and all m, exceed zero, 


ple that fall in stratum A, 

in which the m, are fixed 
Maem Mi 

Vi¥w) er amas A W,S)2 (5A.40) 


The average value of V(¥w) in repeated Samples of size n must now be 
calculated. This requires a little care, since One or more of the m, could be zero. If 
this happened, two or more strata would have to be combined before making the 
estimate, and a less precise estimate would be produced. With increasing n, the 
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probability that any m, is zero becomes so small that the contribution to the 
variance from this source is negligible. 

If the case in which m, is zero is ignored, Stephan (1945) has shown that to 
terms of order n? 


AY leI Wi 
i nW ai n° Wif CA 
Hence 
£ gj Api h 
ELV(yw)] ‘ye EWS ia EA- W,)Si° (5A.42) 


The first term is the value of V(¥,,) for proportional stratification. The second 
represents the increase in variance that arises, because the my do not distribute 
themselves proportionally. But 


1 Séil Pe ae se 
= 1 Wid Se? =1(F\5? =E WS =s = Y WS? (SA.43) 
n n nhi;, n 


where S,, is the average of the S}? and ña = n/L is the average number of units per 
stratum. Thus, if the Sp? do not differ greatly, the increase is about (L — 1)/Lř, 
times the variance for proportional stratification, ignoring the fpc. The increase 
will be small if ñ, is reasonably large. 

This method can also’be applied to a sample that is already stratified by another 
factor, for example, into five geographic regions, provided that the W, are known 
separately within each region. This twofold stratification is widely employed in 
U.S. National Surveys: see Bean (1970) for a description of the estimation 
formulas in the Health Interview Survey of the National Center for Health 
Statistics. 


5A.10 QUOTA SAMPLING 


In another method that has been used in opinion and market research surveys 
the 7, required in each stratum is computed in advance so that stratification is 
proportional. The enumerator is instructed to continue sampling until the neces- 
sary “quota” has been obtained in each stratum. The most common variables for 
stratification are geographic area, age, sex, race, and some measure of economic 
level. If the enumerator were to choose persons at random within the geographic 
areas and assign each to his appropriate stratum, the method would be identical 
with stratified random sampling. A considerable amount of field work would be 
required to fill all quotas, however, since in the later stages most of the persons 
approached would fall in quotas already filled. 

To expedite the filling of quotas, some latitude is allowed to the enumerator 
regarding the persons Or households to be included. The amount of latitude varies 
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with the agency but, in general, quota sampling may be described as stratified 
sampling with a more or less nonrandom selection of units within strata. For this 
reason, sampling-error formulas cannot be applied with confidence to the results 
of quota samples. A number of comparisons between the results of quota and 
probability samples are summarized by Stephan and McCarthy (1958), who give 
an excellent critique of the performance of both types of survey. The quota 
method seems likely to produce samples that are biased on characteristics such as 
income, education, and occupation, although it often agrees well with the proba- 
bility samples on questions of opinion and attitude. 


5A.11 ESTIMATION FROM A SAMPLE OF THE GAIN DUE 
TO STRATIFICATION 


When a stratified random sample has been taken, it may be of interest; as.a 
guide to the conduct of future surveys, to appraise the gain in precision relative to 
simple random sampling. 

The data available from the sample are the values of Nj, na Jm and s,2. From 
section 5.4, the estimated variance of the weighted mean from the stratified 
sample is, by formula (5.13), 


= Wish Wash 
AED 
(Vee) =L = 2 N 


The problem is to compare this variance with an estimate of the variance of the 
mean that would have been obtained from a simple random sample, One 


procedure sometimes used calculates the familiar mean square deviation from the 
sample mean, 


522 Om =F) 
aie 


where the Strata are ignored. This is taken as an estimate of s?, so that Via, = 
(N- n)s°/Nn for the mean of a simple random sample. This method works well 
enough if the allocation is proportional, since a simple random sample distributes 
itself approximately Proportionally among strata. But if an allocation far from 
proportional has been adopted, the sample actually taken does not resemble a 
simple random sample, and this s? may be a poor estimator. A general procedure 
is given, the proof being due to J. N. K. Rao (1962) 

Theorem 5A.1. Given the result: 


estimator of V,an, the variance of th 
same population is 


s of a stratified random sample, an unbiased 
e mean of a simple random sample from the 


s (N=n) LEN," 
 n(N= DINE eee) A] BNS 


where v (Js) is the usual unbiased estimator of V(y,,). 


ran 
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Proof. 
i (N=n) ca _ (N=n) TEN £ $ 
Vran = N SaN- A Yaa r] (5A.45) 
Now 
LNA 
a Oe by yk) =e wh (5A.46) 


Also, since v (Fs) and J, are unbiased estimators of V(j,,) and Y, respectively, 
Ev (Fa) = WFsn) = E (Fa) — Y? (5A.47) 
and hence an unbiased estimator of Y° in (5A.45) is 
Vir OV) (SA.48) 


From (5A.46) and (5A.48) it follows that an unbiased sample estimator of Vian in 
(5A.45) is 


(N-n) [LENA 
Vran = TON awe 2 yi- Fit 0654) | (5A.44) 


This proves Theorem 5A.1. 

With proportional allocation, (Nj,/n, =N/n), the first two terms inside the 
square brackets become (1/n) times the within-sample sum of squares= 
(n—1)s?/n. Formula (54.44) then reduces to 


_ (N-n) [(n-1)2 
er A stoa] (5A.49) 


If n is large, (n—1) =n, (N-1)=N, and the term in v(y,;) is of order 1/n 
relative to the term in s? in (SA.49). Hence 
(N=n) 2 


bert fa See 5A.50. 
i nN ` d ( ) 


for proportional allocation. In the general case the corresponding simplification (n 
large) is 


ran = A ie I% yh yi- z] (5A.51) 


Example. The calculations are illustrated from the first three strata in the sample of 
teachers’ colleges (section 5.9). The data in Table 5A.13 are for the later 1946 sample. The 
means represent enrollment per college in thousands. The s» ? values are slightly higher than 
in the second edition, owing to a correction. 
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TABLE 5A.13 
BASIC DATA FROM A STRATIFIED SAMPLE OF TEACHERS’ 
COLLEGES 

2 Ni ( 5 

Stratum N, n, Tn si? = (5 yà) 
h 

1 13 9 2.200 1.8173 83.920 

2 18 7 1.638 0.0735 49.429 

3 26 10 0.992 0.0859 27.596 
57 26 160.945 


O a S a eaeee 


With n small we use formula (5A.44). We find y,, = 1.4715. The values of the Yri for the 
sample were not reported, but the figures in the right-hand column can be obtained from 
Preceding columns of Table 5A.13. The formulas work out as follows. 

1 


vn) = „2 =0,00497 


Na 


Eel [oe 
Pran = (26)(56) L F57, 


=(1.4715)+0.00497 = 0.01412 


Stratification appears to have reduced the variance to about one third of the value fora 


) simple random sample, the estimated deff factor (section 4.11) being 0.00497/0.01412 = 
0.35. 


5A.12 ESTIMATION OF VARIANCE WITH ONE UNIT 
PER STRATUM 


If the population is highly variable and many effective criteria for stratification 
are known, stratification may be carried to the point at which the sample contains 
only one unit in each stratum. In this event the formulas previously given for 
estimating V(Y,,) and V(¥,,) cannot be used. With L even, an estimate may be 
attempted by grouping the strata in pairs thought beforehand to have roughly 
equal true stratum totals. The allocation into pairs should be made before seeing 
the sample results, for reasons that will become evident, 

Let the sample obseryations in a typical pair be yji, y;2, where j goes from 1 to 


L/2. Let Ýi =N Yir Yj2= Nj2yja be the estimated stratum totals. Now 
Ý ~ Îa= (Ya Yia) + Pa- Yn) (Êz Yp) (5A.52) 
Hence, averaging over all samples from this pair, 


EPn- Pa)? = (Yin Ya) +N A(N i 1)S4 #Na(N2=1)Sj2 (5A.53) 
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For V( Y,,) consider the estimate 
j TITE ARE 
vi Ya) = 2 (Yn Yid) (5A.54) 
ja 
By (5A.53) the expected value of this quantity is 
A L , L/2 s 
Ev(¥s)= X Na (Na —DSi2+ YD (Yn Ya)? (SA.55) 
= jal : 


The first term on the right is the correct variance (by theorem 5.4 with m, = 1). 
The second term represents a positive bias, whose size depends on the success 
attained in selecting pairs of strata whose true totals differ little. The form of the 
estimate (5.54) warns that construction of pairs by making the sample estimated 
totals differ as little as possible can give a serious underestimate. The technique is 
called the method of “collapsed strata.” 

With L odd, at least one group must of course, be of size different from 2. The 
extension of the estimate (5A.54) to G groups of any chosen sizes L,=2 is 


2 C Bpaschrmelie 

OYA) = ee ea ie ¥i/Li) (SA.56) 
jar Lj-Le=s 
where Ý is the estimated total for group j. For L; =2, when i, = vA +¥i2, this 
form agrees with (SA.54). As with (5A.54), the expectation of this vil Ys) gives 
the correct variance V(Y..), plus a positive bias found by substituting Yj, and Y; 
for Y and Y; in (S5A.56). 

When an auxiliary variate A; is known for each stratum that predicts the 
stratum total Yp, Hansen, Hurwitz, and Madow (1953) suggested the alternative 
variance estimator 


A Gat Tenis. A A 
val Pa) = pq 2, Fm Amv A” (5A.57) 
j=1 Ej k=1 


If A, is a good predictor, the positive bias term in v2, coming from the 
deviations (Ý —AjY;/A,)°, is likely to be smaller than the corresponding term in 
v1; although unlike vı, v2 also gives a biased estimate of the term in the S, in 
V(Y,,). Hartley, Rao, and G. Kiefer (1969) found vz less biased than v; in two of 
three populations, with little difference in the third. 

These authors developed a method that does not involve the collapsing of 
strata. This method uses one or more auxiliary variates xın X21, and so forth, on 
which the true stratum means F, are thought to have a linear regression. If y, is 
the sample value in stratum h, the method uses the deviations 


dy, = yn -7 -È bitin 75) (5A.58) 


The variance-covariance matrix of the dy can be expressed as a linear function 
of the o;2, plus certain bias terms. By inverting this relation, estimates ¢,’ are 
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obtained, where op = (N, = 1)Sk/Np giving 


A L 
vY) = E NeR (5A.59) 
h=1 5 


The method appears promising and extends to ratio estimates, but the authors 
warn that additional comparisons with “collapsed strata” methods are needed. 

Using a different approach, Fuller (1970) developed a method of stratum 
construction that provides an unbiased sample estimate of V(Y,,) with one unit 
per stratum, For simplicity suppose that N/n = N/L =k (an integer). Select a 
random number r between 1 and k. The first stratum consists of the units 
numbered from (r +1) up to (r +k), the second those numbered from (r +k + 1)up 
to (r+2k), and so on, the last (Lth and nth stratum) those numbered from 
r+(n—1)k+1to N=nk and those from 1 tor. At first sight, this last stratum may 
look a poor choice. As Fuller notes, however, this method can work well in 
geographic stratification with areal units. Here, stratification usually leans on the 
notion that units near one another tend to be similar. By numbering units in 
Serpentine fashion, one can have yy near y4, so that the stratum that includes both 
yn and y; an also be internally homogeneous. The estimate v(Y,,) is a weighted 
sum of the differences (Ya — y,+1)?. 

The circular method would be less effective for a population showing a rising 
trend from y, to yx, in which the stratum including both y, and yn would have 
large internal variability. For this situation Fuller gives a second plan, slightly 
more complex, which should give good precision with a rising trend and also 
furnishes an unbiased estimate of V(Y;,). 


54.13 STRATA AS DOMAINS OF STUDY 


This section deals with Surveys in which the primary purpose is to make 
comparisons between different strata, assumed to be identifiable in advance. The 
rules for allocating the sample sizes to the strata are different from those that apply 
when the objective is to make over-all population estimates. If there are only two 
strata, we might choose mj, nz to minimize the variance of the difference (7, — f2) 


between the estimated strata means, Omitting the fpc’s for reasons given in 
section 2.14, we have ~ 


DHOO 
ji- y) ag -60 
V(¥1—Fa) Fiat (5A.60) 
With a linear cost function 
C=coteimi+can2 ` (54.61) 
V is minimized when R 
nS; nS, 
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ms, To ee <a BEST Te (5A.62) 
Pais ert Sle Sy MEER; 
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With L strata, L >2, the optimum allocation depends on the amounts of precision 
desired for different comparisons. For instance, the cost might be minimized 
subject to the set of L (L —1)/2 conditions that V(¥;, — ¥;) = Vp where the values 
of V, are chosen according to the precision considered necessary for a satisfac- 
tory comparison of strata h and i, 

Frequently a simpler method of allocation is adequate, especially if the S, and 
c do not differ greatly. One approach is to minimize the average variance of the 
difference between all L(L —1)/2 pairs of strata, that is, to minimize 


2 2 2 
72 (= +52 4...45) (54.63) 
L\n m ng 
V is minimized, for fixed C, by the rule in (5A.62), 
Sh 
ots 5A.64 
np A ( ) 


This rule may result in certain pairs of strata being more precisely compared and 
others less precisely than is felt appropriate. An alternative is to select the n, so 
that the s.e. of the difference is the same, say VV, for every pair of strata. This 
amounts to making S;,/n, = V/2 for every stratum. For a fixed cost this method 
gives less over-all precision than the first method. The reader may verify that the 
two optimum allocations give 


= 20 Siven)? ~ 20 Si7en) 
ea Eea ey) 


It follows from the Cauchy-Schwarz inequality that V is AD greater than V 
unless SVC, = constant. If V is substantially greater than V, a compromise 
allocation can sometimes be found, after a little trial and error, that will give an 
average variance close to V and also keep V(¥,, — Ji) feasonably constant. 

Sometimes the objective is to obtain estimates for each stratum as well as 
over-all estimates for the whole population. In planning the survey, we might 
specify the following conditions. 


ui 


2 

Vin) ="(1-fy)s Vp Vla) =¥ (1-f,)sV 
The fpc terms are now included, since the purpose is to jab the precision with 
which the means in the finite population are to be estimated. The conditions on the 
V(¥;,) determine lower limits to the values of the na. If these lower limits are found 
to satisfy the condition on V(¥,,), the allocation problem is solved. When the 
condition on V(J,,) is not satisfied, Dalenius (1957) has indicated a graphical 
approach. 

More complex problems arise when the L = 2* strata represent all combina- 
tions of k factors each at two levels, and the objective is to estimate the average 
effects of the factors. If the stratum or cell to which any member of the population 
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belongs is known in advance of sampling, a sample of desired size n, can be 
drawn from stratum h. For 2, 3, or 4 factors Sedransk (1967) has given methods 
for finding the n, that minimize the cost under different specifications about the 
variances of the estimated average effects of the factors and the desired power of a 
test for interactions. 


54.14 ESTIMATING TOTALS AND MEANS OVER 
SUBPOPULATIONS 


Frequently the subpopulations or domains of study are represented in all strata. 
If stratification is geographic, for example, separate estimates may be wanted, 
over the whole population, for males and females, for different age groups, for 
users and nonusers of Blank’s toothpaste, and the like. The problem presents 
some complications. The basic formulas were given by Yates (1953) with further 
discussion and proofs by Durbin (1958) and Hartley (1959). Methods applicable 
to a single stratum are discussed in sections 2.10 and 2.11. 


The following notation applies to the units in stratum A that lie in domain j. 
Notation. 


Number of units: Nj; E Nu = Np 
7 
Number in sample: ny, = ny =n, 
i 
Measurement on individual unit: ypj 
Sample mean: Fw = ky du 


i=1\ Mhj 


S Nu y 
Domain mean: Y,;= Ł pan 
izi Ny 


The population total and mean for domain j over all strata are, respectively, 


Y; =E Nu Ynn ýe 


where N; =} Ny. 
h 


The complication arises because the 7; are random variables. If the Nw Were 
known, the problem would be simple. As estimates of Y, and F, we could use 


Ín =E Nuns A Si 
h N; 
By the method in section 2.12, the ordinary formula for V(¥ni) iS still valid, 
provided all 74; >0. Thus 


FURTHER ASPECTS OF STRATIFIED SAMPLING 143 


Z NSi, Naj 
v= Mash (1u) 5A 
(47) x a N, (5A.66) 
where Sj, is the variance among units in domain j within stratum h. In applica- 
tions, however, the N; are rarely known. 


Estimating Domain Totals 


In default of the N,;, each stratum total of the domain is estimated as in section 
2.13. These totals are added to obtain an estimated domain total, that is, 


Y Yad (5.67) 


The true and estimated variance of Y; are found by the device used in section 
2.13. A variate y}, is introduced that equals ya; for all units in domain and equals 
zero for all other units in the population. As shown in section 2.13 this gives for the 
estimated variance 

Nie 


Thi 2 
v(Ŷ)= Dares i -|E va- htt (54.68) 


FA) 


Estimating Domain Means 


In order to estimate the domain mean Y;/N, a sample estimate of N; is 
required. An unbiased estimate is 


N =E — m; (SA.69) 


Hence we take 


ô X (N/a) x Yhij 
P Aeh oh ature 28 (5A.70) 
N; 2 (Na/ nn )Mtny 


With proportional stratification, y, reduces to the ordinary sample mean of the 
units that fall in domain j. In the general case, this estimate is known as a combined 
ratio estimate; discussed later in section 6.11. To show it, introduce another 
dummy variate x}; which equals 1 for every unit in domain j and 0 for all other 
units, where į now goes from 1 to N,. Clearly, 


Th Ming 

f Xni” x Yhi 2 Vij ny 
z,/ = — = p =i = —_ = — ji, 5A.71 
xh x Ae Yh at Th Nh Yhj ( ) 
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so that the estimated domain mean may be written 
ENAN) E Yny ENI — 
h i =} Ys 


TA T 
‘i Z (N;/ n) D Nin Es 


(5A.72) 


This is the formula for the combined ratio estimate for the two variables y,,;’ and 
Xni. From section 6.11, the estimated variance may be expressed approximately 
as 
S e 1 Nv (fx) % s 5 
vl Y) == i’ Yxn =n- Yin N 5A.73 
j. NZ mlm 1) > D j¥hi Gi, j: 7 ( ) 


The second summation may be written 
np T E Sagi ny i n? 2 Bs 
E Oni — Yxn Y — n Gn! — Yin Y = (Yay — Yn Oy 1Y) (54.74) 
i i h 


using (5A.71). Furthermore, the first term in (5A.74) can be expressed alterna- 
tively as 
Thy 


È Oni — Ini) HM Fy — Êy’ 
Inserting these results in (5A.73) gives, finally, for the estimated variance, 


s a 1 Mi (1-f,) S n 4 
NPs aD (OnI +m 15) Gy- ĉ»] (5A.75) 


The term on the right represents a between-stratum contribution to the 
variance. Differences among strata means are not entirely eliminated from the 
variance of the estimated mean of any subpopulation. The between-stratum 
contribution is small if the terms 1 —ny,/n, are small, that is, if the subpopulation 
is almost as large as the complete population, 

As Durbin (1958) has pointed out, (SA.75) applies also to means estimated for 
the whole population, if the sample is incomplete for any reason such as non- 
response, provided, of course, that Y; is the estimate used, In this event Y; is 
interpreted as the estimated mean for the part of the population that would give a 
response under the methods of data collection employed. There is, however, an 
additional complication, in that the “nonreponse” part of the population often has 
a different mean from the “response” part. Thus Y; is a biased estimate of the 


neon the whole population, and this bias contribution is not included in 


5A.15 SAMPLING FROM TWO FRAMES 


An early example of the use of a sample from a list B of large businesses in 
combination with a sample from an areal frame A that covers the complete 
population is the 1949 sample survey of retail Stores taken by the Census Bureau 
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and described in Hansen, Hurwitz, and Madow (1953, p. 516 ff.). The objectives 
in this combined use of an incomplete list frame and a complete areal frame were 
to gain increased accuracy and save money. Large businesses can sometimes be 
sampled cheaply by a combination of mail and telephone; moreover, being large, 
they are often the businesses that have the largest variance for the y variables 
being measured. Placing them in a separate stratum with optimum allocation (or 
100 per cent sampling if this seems close to optimum) can produce substantial 
increases in accuracy. In the retail stores survey, businesses in the list frame that 
were present in the area sample were identified and removed from the area 
sample, so that the population being sampled fell into two distinct strata. This 
process, called unduplication, was performed by the field supervisor in advance of 
sampling. Unduplication is sometimes complicated and subject to errors. Some of 
the practical difficulties are discussed by Hansen, Hurwitz and Jabine (1963), as 
well as various ways in which incomplete lists can be helpful. 

Identification of the sample members from the frame A sample that belong to 
the list frame B sometimes requires measurement of the y variables for these 
members. The sampler then has three samples at his disposal in which y has been 
measured—a sample from stratum a (the part of A that does not belongto B) and ~ 
two independent samples from stratum B. One, denoted by the suffix ab, is the 
sample obtained from A and identified as belonging to B, and one is obtained by 
direct sampling of frame B. With frame A complete and simple random sampling 
from both frames, Hartley (1962) proposed that both B samples be used in the 
poststratified estimate 


Y = Naya + Nab (Pab +492) (5A.76) 


where Ña Jab» Yp denote the respective sample means. The weighting factors p and 
q for the two samples that belong to frame B, with p+q=1, are chosen to 
minimize V(Y) under a cost function of the form 


C=cana +cpng (5A.77) 


In: (5A.76) the straturn sizes Nas = Np and N,=(Na—Nsz) will, of course, be 
known. 

With Sp’>S,* and cy <c, Hartley showed that this method can give large 
reductions in V(¥) as compared with sampling from frame A only, even if the 
frame A sample is poststratified into the two strata a and B =ab. 

The problem becomes more difficult if frame A is also incomplete, two frames 
A and B, with some duplication, being required to obtain complete coverage of 
the population. For poststratification, there are three distinct strata: a (units in A 
alone); ab (units in both A and B); and b (units in B alone). The three strats 
cannot be sampled directly, samples of sizes na, ng having to be drawn from 
frames A and B. Furthermore, the strata sizes Na, Nas» Np will not usually be 
known. For simple random sampling from frames A and B, Hartley (1962) 
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suggested the estimate 


+ Na Na Ng Ng 
Ya = Yates Yab kaz Yoat Yo (5A.78) 
where, as before, p +q =1 and the y’s are sample torals:in the strata, the suffixes 
ab and ba denoting the samples in the duplicate stratum found in na and ng. 
Hartley determined p and q to minimize V(Y) for fixed cost. Improvements in 
Hartley’s estimate (5A.78) have been given by Lund (1968) and Fuller and 
Burmeister (1972), essentially by using better estimates of Na, Nas, Np than are 
implied in Hartley’s estimate. Fuller and Burmeister (1972) also dealt with the 
case in which frame A is areal, with subsampling of the areal units. Hartley (1974) 
gives a general approach to two-frame sampling, applicable to any sample design 
in the two frames. z 


EXERCISES 


5A.1 In planning a survey of sales in a certain type of store, with n =550, good 
estimates of S, are available from a previous survey in two of the three strata, The third 
stratum consists of new stores and stores that had no sales in the previous survey, so that a 
value for S, has to be guessed. If S, is actually 10, compute V(¥,,) as given by an estimated 
Neyman allocation when S, is guessed as (a) 5, (b) 20. Show that in both cases the 
proportional increase in variance over the true optimum is slightly over 2%. 


- True Estimated S, 
Stratum Wy Sr (a) (b) 
1 0.3 30 30 30 

2 0.6 20 20 20 

3 0.1 10 5 20 


„ 5A.2 Show that if all Sh except S,, are correctly estimated and S, is estimated as 
SL =SL(1 +A), the proportional increase in V,,,(y,,), using $, instead of the true S, for 
Neyman allocation, is 


Aĉ°ni(n— ni) 
(1+A)n? 


where nj is the sample size in stratum L under true Neyman allocation. Verify that this 
fore sees with the results in exercise SA.1. (The agreement is not exact because of the 
rounding of the n, to integers.) Hence show that a 50% underestimat} same 
effect as 100% overestimation. k restimanion of Setagey 
5A.3 If there are two strata and if ¢ is the ratio of the actual n,/nz to the Neyman 
optimum n,/n2, show that whatever the values of N,, Nz, Sy and Sa, the ratio 
Vmin (x)/ V(Fa:) is never less than 46(1+¢)? when the fpc’s are negligible. 
5A.4 The results of a simple random sample with n = 1000 can be classified into three 
“strata,” with J, =10.2, 12.6, and 17.1, s,7=10.82 (the same in each stratum), and 
5?=17,66. The estimated stratum weights are Ww, =0.5, 0.3, 0.2, respectively. These 
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weights are known to be inexact, but itis thought that all are correct within 5%, so that the 
worst cases are either W, =0.525, 0.285,and 0.190 or W, =0.475, 0.315, and 0.210. By 
the methods of section 5A.2, would you recommend stratification? (Where needed, assume 
that Ja = Y, and s,7=S,7.) 

5A.5 Ina stratified random sample with two variates the objective is to satisfy the 
specifications 

VO) = Vas Vas) = V2 

for minimum cost C= F c,n,. The fpe’s can be ignored. (a) Prove the result by Chatterjee 
(1972) that a compromise allocation is necessary if 


LWrSanVen Va E WilSi/ Suen 
LWi(Sh/Sae, Vi E WiSinven 


(b) If V;/ V, equals or exceeds the upper limit, the optimum allocation for y, satisfies 
both tolerances, with a corresponding result about the lower limit. 


5A.6 - A survey with three strata is planned to estimate the percentage of families who 
have accounts in savings banks and the average amount invested per family. Advance 
estimates of the percentages P, and the within-stratum S, for the amount invested are as 
follows. 


Stratum Wy P %) Sa($) 
1 0.6 20 90 ' 
2 0.3 40 180 
3 0.1 70 520 


Compute the smallest sample sizes n and the n, that satisfy the following requirements: (a) 
The percentage of families is to be estimated with s.e. = 2 and the average amount invested 
with s.e.=$5. (b) The percentage of families is to be estimated with s.e.=1.5 and the 
average amount invested with s.e. = $5. 

Part (b) requires a compromise allocation, either by a computer program or the method 
in the second edition, p. 123. The allocation ny = 371,344,315, with n = 1030 satisfies both 
tolerances, Show that the Booth-Sedransk method (section 5A.4) gives n,/n = 0.431, 
0.326, 0.243. This allocation would require n = 1073 to meet both tolerances. 

5A.7 The table at top of p. 148 shows the frequency distribution of a population of 911 
city sizes for cities from 10,000 to 60,000, arranged in classes of 2000. To shorten the 
calculations, a coded y' and values of Vf, cum. Vf, cum. f, fy’, and E fy’ are given. Apply the 
Dalenius-Hodges rule to create two strata for optimum allocation in the sense of Neyman. 
Find the values of W, and S, for each of your strata. Verify (a) that the optimum sample 
sieg are almost the same in the two strata and (6) by finding S? for the whole population, 
that 


mate 
Vort (Fst) 
5A.8 The right triangular distribution f(y) =2(1—y), 0<y <1, is divided into two 
strata at the point a. (a) Show that 


W,=a(2-a), W2=(1—a) 


+2 @(6-6a+a7) ea Oza) 
i 18(2—ay? ” 2 18 
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(b): Show that under the cum Vfrule the best choice of a is 1— 1/74 = 0.37 and that with 
this boundary the optimum n,/nz is about 34 and V(¥,,) is about 27% of the value given by 
simple random sampling. : 


5A.9 In both exercises 5A.7 and 5A.8, show that the Ekman rule W, (yn — yn-1) = 
constant agrees very closely with the cum. Vf Tule in determining the stratum boundaries. 


if y Vf Cumf Cum Vf fy’ 

10— 205 0 14.3 205 14.3 0 
12— 135 1 11.6 340 25.9 135 
14— 106 2 10.3 446 36.2 212 
16— 82 3 9.1 528 45.3 246 
18— 61 4 7.8 589 53.1 244 
20— 42 5 6.5 631 59.6 210 
22— 32 6 5.7 663 65.3 192 
24— 30 7 5.5 693 70.8 210 
26— 27 8 5.2 “720 76.0 216 
28— 18 9 4.2 738 80.2 162 
30— 22 10 4.7 760 84.9 220 
32= 21 ll 4.6 781 89.5 231 
34— 19 12 4.4 800 93.9 228 
36— 16 13 4.0 816 97.9 208 
38— 14 14 3.7 830 101.6 196 
40— 17 15 4.1 847 105.7 255 
42— 9 16 3.0 856 108.7 144 
44— 8 17 2.8 864 111.5 136 
46— 11 18 3.3 875 114.8 198 
48— 9 19 3.0 884 117.8 171 
50— 7 20 2.6 891 120.4 140 
52— 4 21 2.0 895 122.4 84 
54— Si 22 2.2 900 124.6 110 
56— 5 23 2.2 905 126.8 115 
58 — 6 24 2.4 911 129.2 144 
Totals 911 129.2 4407 


> fy? = 50,395 


5A.10 A sum of $5000 is available for a stratified sample. In the notation of section 
5A.8 the cost function is thought to be, roughly, C=200L +10n and 
“ 2 2 


vgn- [a-a] 


where p is the correlation between the variate used to construct the strata and the variate to 
be measured in the survey. Compute the optimum L for p = 0,95, 0.9, and 0.8. What is a 
good compromise number of strata to use for all three values of p? 

5A.11 The following data are derived from a stratified sample of tire dealers taken in 
March 1945 (Deming and Simmons, 1946). The dealers were assigned to strata according 
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to the number of new tires held at a previous census. The sample means y, are the mean 
numbers of new tires per dealer. (a) Estimate the gain in precision due to the stratification. 
(b) Compare this result with the gain that would have been attained from proportional 
allocation. 


Stratum 
Boundaries Nr Wr Tr Sn? ny 
1-9 19,850 ` 0.8032 4.1 34.8 3000 
10-19 3,250 0.1315 13.0 92.2 600 
20-29 1,007 0.0407 25.0 174.2 340 
30-39 606 0.0245 38.2 320.4 230 
Totals 24,713 0.9999 4170 


5A.12 A population has two strata of relative sizes W, = 0.8, W,=0.2 and within- 
stratum variances S,? = 100, S3 = 400. A stratified random sample is to be taken to satisfy 
the following requirements: (i) the means of each stratum are to be estimated with variance 
<1; (ii) V(¥,,)=0.5. Ignoring the fpc, find the values of nı, nz that satisfy all three 
requirements for minimum n =n, +n. 

Hint. Note that —V(J,,)/an, >—3V (Fx)/ðN if my<2n2, Fuller (1966) has discussed 
various methods of handling this problem. 

5A.13_ Inan example due to Nordbotten (1956) and worked by Kokan (1963), a survey 
is planned to estimate total employment Y, and the value of production Y; in establish- 
ments manufacturing furniture. When establishments are stratified by size, the N, and 
rough estimates of the S}, are as follows. 


Stratum Ny Six Bs 

Large 600 200 500,000 

Small 1,000 10 4,000 
1,600 


The requirement that estimates of Y, and Y; not be in error by more than 6% (P =0.95) 
amounts to tolerances 


; Vi = V(Fise) 0.0351: V2 = V(Jzu) 556.25 
Show that the optimum allocation for y, with n, =450, n2=167, n =617 satisfies both 
tolerances. Note that in this problem the fpc cannot be ignored. 

_ 5A.14 In stratified random sampling with one unit per stratum, assume that the strata 
can be grouped into pairs with Nj, = Nj. =N; (j= 1, 2, -+ * , L/2). An alternative sampling 
method draws two units at random from each pair of strata. Show that for this method 


5) (N-D We 2 j 
vdan ona PNN 3S Fn Ya] 


Hence show that the expected value of the “collapsed strata” estimate v,( ¥,) in formula 
(5A.54), section 5A.12, overestimates V(Y,,.), the variance that would apply if strata twice 
as large were used. 


CHAPTER 6 
Ratio Estimators 


6.1 METHODS OF ESTIMATION 


One feature of theoretical statistics is the creation of a large body of theury that 
discusses how to make good estimates from data. In the development of theory 
specifically for sample surveys, relatively little use has been made of this know- 
ledge. I think there are two principal reasons. First, in surveys that contain a large 
number of items, there is a great advantage, even with computers, in estimation 
procedures that require little more than simple addition, whereas the superior 
methods of estimation in statistical theory, such as maximum likelihood, may 
necessitate a series of successive approximations before the estimate can be 
found. Second, as noted in section 1.4, there has been a difference in attitude in 
the two lines of research. Most of the estimation methods in theoretical statistics 
assume that we know the functional form of the frequency distribution followed 
by the data in the sample, and the method of estimation is carefully geared to this 
type of distribution. The preference in sample survey theory has been to make, at 
most, limited assumptions about this frequency distribution (that it is very skew or 
rather symmetrical) and to leave its specific functional form out of the discussion. 
This preference leads to the use of simple methods of estimation that work well 
under a range of types of frequency distributions. This attitude is a reasonable one 
for handling surveys in which the type of distribution may change from one item to 
another and when we do not wish to stop and examine all of them before deciding 
how to make each estimate. 

Consequently, estimation techniques for sample survey work are at present 
restricted in scope. Two techniques will now be considered—the ratio method in 
this chapter and the linear regression method in Chapter 7. 


6.2 THE RATIO ESTIMATOR 


In the ratio method an auxiliary variate x; correlated with y, is obtained for 
unit in the sample. The population total X of the x; must be known. In 
is often the value of y; at some previous time when a complete census 
hod is to obtain increased precision by taking 


150 


each 
practice, x; n the valu 
was taken. The aim in this met 
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advantage of the correlation between y; and x; At present we assume simple 
random sampling, 
The ratio estimate of Y, the population total of the yi, is 


A 


Jat ‘ 
Yr a as (6.1) 


where y, x are the sample totals of the y; and x; respectively. 

If x; is the value of y; at some previous time the ratio method uses the sample to 
estimate the relative change Y/X that has occurred since that time. The estimated 
relative change y/x is multiplied by the known population total X on the previous 
occasion to provide an estimate of the current population total. If the ratio y,/x; is 
nearly the same on all sampling units, the values of y/x vary little from one sample 
to another, and the ratio estimate is of high precision. In another application x; 
may be the total acreage of a farm and y; the number of acres sown to some crop. 
The ratio estimate will be successful in this case if all farmers devote about the 
same percentage of their total acreage to this crop. 

If the quantity to be estimated is Y, the population mean value of y; the ratio 
estimate is 


Ýk =% 
x 


Frequently we wish to estimate a ratio rather than a total or mean, for example, 
the ratio of corn acres to wheat acres, the ratio of expenditures on labor to total 
expenditures, or the ratio of liquid assets to total assets. The sample estimate is 
R = y/x. In this case X need not be known. The use of ratio estimates for this 
purpose has already been discussed in sections 2.11 and (with cluster sampling for 
proportions) 3.12. 


Example. Table 6.1 shows the number of inhabitants (in 1000's) in each of a simple 
random sample of 49 cities drawn from the population of 196 large cities discussed in 
section 2.15. The problem is to estimate the total number of inhabitants in the 196 cities in 
1930. The true 1920 total, X, is assumed to be known. Its value is 22,919. 
The example is a suitable one for the ratio estimate. The majority of the cities in the 
sample ow an increase in'size from 1¢20 to 1930 of the order of 20%, From the sample 
ata we have 


y=Dy,=6262, x=} x, = 5054 
Consequently the ratio estimate of the 1930 total for all 196 cities is 


4, y 6262 
Ýr =4X ==— (22,919 = 28,397 
ne 30587 ) 
The corresponding estimate based on the sample mean per city is 
a 7 2 
f=Ny -OKD 25,048 


The correct total in 1930 is 29,351. 
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TABLE 6.1 | 
SIZES OF 49 LARGE UNITED States Crries (in 1000's) 1N 1920 (@;) AND 1930 (u) 
Ti Yi ee. Yi Ti Yi 
= 
76 80 2 50 243 291 
138 143 507 634 87 105 ‘ 
67 67 179 260 30 111 | 
29 50 121 113 71 79 
381 464 50 64 256 288 | 
23 48 44 58 43 61 l 
37 63 77 89 25 37 | 
120 115 64 63 94 85 | 
61 69 64 77 43 50 
387 459 56 142 298 317 
93 104 40 60 36 46 
172 183 40 64 161 232 
78 106 38 52 74 93 
66 86 136 139 45 53 
60 57 116 130 36 54 
46 65 46 53 50 58 
48 75 


w 


Frequency 


18 20 22°24 26 28 30 32 34-36 38 apa MA 
Total populaton (millions) 


Fig.6.1 Experimental comparison of the ratio estimate with the estimate based on the sample mean. 


zv 
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Figure 6.1 shows the ratio estimate and the estimate based on the sample mean per city 
for each of 200 simple random samples of size 49 drawn from this population. A substantial 
improvement in precision from the ratio method is apparent, 


6.3 APPROXIMATE VARIANCE OF THE RATIO 
ESTIMATE 


The distribution of the ratio estimate has proved annoyingly intractable 
because both y and x vary from sample to sample. The known theoretical results 
fall short of what we would like to know for practical applications. The principal 
results are stated first without proof. 

The ratio estimate is consistent (this is obvious). It is biased, except for some 
special types of population, although the bias is neglible in large samples. The 
limiting distribution of the ratio estimate, as z becomes very large, is normal, 
subject to some mild restrictions of the type of population from which we are 
sampling, In samples of moderate size the distribution shows a tendency to 
positive skewness in the kinds of populations for which the method is most often 
used. We have an exact formula for the bias but for the sampling variance of the 
estimate only an approximation valid in large samples. 

These results amount to saying that there is no difficulty if the sample is large 
enough so that (a) the ratio is nearly normally distributed and (b) the large-sample 
formula for its variance is valid. As working rule, the large-sample results may be 
used if the sample size exceeds 30 and is also large enough so that the coefficients 
of variation of ¥ and f are both less than 10%. 


Theorem 6.1. The ratio estimates of the population total Y, the population 
mean, Y, and the population ratio Y/X are, respectively, 
sf Tie eed 
=2 =2 R=2 
Yr ra R z 3 z 


In a simple random sample of size 7z (n large) 


N 
2 (yi: - Rx} 
sa N-P m1 
TR ETE al ae C2 
á 1-f Z (y;— Rx)? 
ae (en aT T, oe 
g ew x (yi —-Rx;) 
AO S NEA S 


where f = n/N is the sampling fraction. The method used in theorem 2.5 shows 
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that (6.2), (6.3), and (6.4) are also approximations to the mean square errors of the 
estimator in these formulas. 

The argument leading to the approximate result (6.4) was given in theorem 2.5; 
Since Yr = XR, Ýr = NXR, the other two results follow immediately. 


Corollary 1. There are various alternative forms of the result. Since Y = RX, 
we may write 


2 

VPA È lo- D-RE- DT 
ea R EA 
FNS E O IVAR L 


—2R X (yi — Yx- X)] 
The correlation coefficient p between y; and x, in the finite population is defined 
by the equation 
__ Ey- De- l0- Da- 
 VEy -YVE -X}  (N-1)S,S; 
This leads to the result 


5 wus —f) 


V( Êr) =— S; + R*S,’ = 2RpS,S.) (6.5) 


An equivalent form is 


A SaS se) 

voa- (e+ Se 
where Sy, =pS,S, is the covariance between y; and x;. This relation may also be 
written as 


(6.6) 


2 
vi ¥q)=(-P—(Cy + Cu -2C,) (6.1) 


where C,,,C,, are the squares of the coefficients of variation (cv) of y; and x; 
respectively, and C,, is the relative covariance. 


Corollary 2. Since Yr, Yr, and R differ only by known multipliers, the 
coefficient of variation (i.e., the standard error divided by the quantity being 
estimated) is the same for all three estimates. From (6.7) the square of this cv is 


(ev)?= Ue sel Fp (Cot Gaz 2Cu) (6.8) 
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The quantity (cv)? has been called the relative variance by Hansen et al. (1953). Its 
use avoids répetition of variance formulas for related quantities like the estimated 
population total and mean. 


6.4 ESTIMATION OF THE VARIANCE FROM A SAMPLE 


From equation (6.2), 


2 
ee ma) nžo- = 


As already mentioned in section 2.11, we take 


È (yi -Rx,)? 


(n—1) 
as a sample estimate of the population variance. This estimate has a bias of 
order 1/n. 
For the estimated variance, u(¥’z), this gives 
¢ \_N(1-f) & 2 
Yp)= - 6.9 
u(Yr)= RASTI D+ x (yi. Rx) (6.9) 
This result may be expressed in several different ways. For example, 
1# 
v(m) = OD 5 yR Erak E ya) (6.10) 
peme (611) 


where sys = (y; —y)(x; -—£)/(n —1)is the sample covariance between y; and x;. 
There are two alternative formulas for the sample estimate of the variance, 
Since Ýr =NXR, one form for R is 


nkt Pe +R?s2—2Rs,,) (6.12) 


Since, however, R = 5/z, the quantity X need not be known and is sometimes 
not known when estimating R. This suggests the alternative form 


v (R) =T Pus, 2 R?s,?—2Rs,x) (6.13) 


This form could also be used for v(Yg), taking v2(Ŷr) =X7v2(R): 
This raises the question: If X is known, is v; preferable to v2? The answer is not 
at present clear. P. S. R. S. Rao and J. N. K. Rao (1971) studied the biases in v; and 
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v2 analytically in an infinite population with the linear regression model 
y, =at Bx; +e; 
with 
E(e,|x,) =0, V(e:|x;) = ôx, E(e.e;|x:x;) =0, 


where 0 £t £2 and x; has a gamma distribution ax"~'e~*. The range 0 <1 <2.was 
studied because in applications the residual variance of y; is thought to increase 
with x; at different rates in different populations. They found vz less biased for 
0=1<3/2, but also less stable for t = 0 or t = 1. 


6.5 CONFIDENCE LIMITS 


If the sample is large enough so that the normal approximation applies, 
confidence limits for Y and R may be obtained. 


Y: YrtzVv(¥p) (6.14) 
R:R+zVv(R) (6.15) 


where z is the normal deviate corresponding to the chosen confidence probability. 

In section 6.3 it was suggested that the normal approximation holds reasonably 
well if the sample size is at least 30 and is large enough so that the cv’s of y and ¥ 
are both less than 0.1. When these conditions do not apply, the formula for v(R) 
tends to give values that are too low and the positive skewness in the distribution 
of R may become noticeable. x 

An alternative method of computing confidence limits, which takes some 
account of the skewness of the distribution of R, has been used in biological assay 
(Fieller, 1932; Paulson, 1942). The method requires that ý and < follow a 
bivariate normal distribution, so that (ý — Rx) is normally distributed. It follows 
that in simple random samples the quantity 


y—Rx 
V[(N-n)/NnWs,?+R?s—2Rs,, 


(6.16) 


is approximately normally distributed-with mean zero and unit standard devia- 
tion. 

‘The value of R is unknown, but any contemplated value of R which makes this 
normal deviate large enough may be regarded as rejected by the sample data. 
Consequently, confidence limits for R are found by setting (6.16) equal to +z and 
solving the resulting quadratic equation for R, The confidence limits are approxi- 
mate since the two roots of the quadratic are imaginary with some samples. Such 
cases become rare if the cv’s of y and ¥ are less than 0.3. 
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After some manipulation, the two roots may be expressed as 


Ragone z) EZV (C55 + Cee — 2cyz)— Z’ (CysCze — Cya) 
1-27 eee 


(6.17) 


where 


is the square of the estimated cv of y, with analogous definitions of cyz and czs. If 
27 Cyy, Z7Cgg, and Z?Cyg are all small relative to 1, the limits reduce to 


R= =RizVcy, g Cze — ZC yz 


This expression is the same as the normal approximation (6.15). 

Even with bivariate normality, the Fieller limits have been criticized as not 
conservative enough. James, Wilkinson, and Venables (1975) explain the nature 
of the difficulty and present an alternative method. 

° 


6.6 COMPARISON OF THE RATIO ESTIMATE WITH 
THE MEAN PER UNIT 


The type of estimate of Y that was studied in preceding chapters is Ny, where ¥ 
is the mean per unit for the sample (in simple random sampling) or a weighted 
mean per unit (in stratified random sampling). Estimates of this kind are called 
estimates based on the mean per unit or estimates obtained by simple expansion. 


Theorem 6.2 In large samples, with simple random sampling, the ratio 
estimate Yz has a smaller variance than the estimate Ê= Nj obtained by simple 


expansion, if 
>3($) / (2) ___ coefficient of variation of x; 
oa) Y/  2(coefficient of variation of y;) 


Proof. For ¥ we have 


For the ratio estimate we have from (6.5) 


A 


2, 
VËR =P (5,24 2°52 -2R0S,5:) 


Hence the ratio estimate has the smaller variance if 
S; +R°S? —2RpS,S, <S; 
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If R = Y/X is positive, this condition becomes 


PEAQ/G188. ow 


6.7 CONDITIONS UNDER WHICH THE RATIO ESTIMATOR 
IS A BEST LINEAR UNBIASED ESTIMATOR 


A well-known result in regression theory indicates the type of population under 
which the ratio estimate may be called the best among a wide class of estimates, 
The result was first proved for infinite populations. Brewer (19634) and Royall 
(1970a) extended the result to finite populations. The result holds if two condi- 
tions are satisfied. 


1. The relation between y; and x; is a straight line through the origin. 
2. The variance of y; about this line is proportional to x;. 


A “best linear unbiased estimator” is defined as follows. Consider all estimators 
Ý of Y that are linear functions of the sample values y;, that is, that are of the form 


hyithyat:+s+hyn 


where the /’s do not depend on the y; although they may be functions of the x, The’ 
choice of /’s is restricted to those that giye unbiased estimation of Y, The 
estimator with the smallest variance is called the best linear unbiased estimator 
(BLUE). 

Formally, Brewer and Royall assume that the N population values (y; x) are a 
random sample from a superpopulation in which 


yi = Bx; +e; (6.19) 


where the z; are independent of the x; and x; >0. In arrays in which x; is fixed, £; 
has mean 0 and variance Ax;. The x;(i=1,2,..., N) are known. 

In the randomization theory used thus far in this book, the finite population 
total Y has been regarded as a fixed quantity. Under model (6,19), on the other 

N í s 

hand, Y = 6X +% s; is a random variable. In defining an unbiased estimator under 
this model, Brewer and Royall use a concept of unbiasedness which differs from 
that in randomization theory. They regard an estimator Y as unbiased if E(Y) = 
E(Y) in repeated selections of the finite population and sample under the model. 
Such an estimator might be called model-unbiased. r 


Theorem 6.3. Under mode! (6.19) the ratio estimator Ýr = X9/ž is a best 
linear unbiased estimator for any sample, random or not, selected solely according 


to the values of the x;. 
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Proof. Since E(e;|x;)=0 in repeated sampling, it follows from (6.19) that 
¥=6X+5.¢: E(Y)=BX (6.20) 
Furthermore, with the model (6.19) any linear estimator Ý is of the form 
P= hy =8 Slat S hei (6.21) 


If we keep the n sample values x; fixed in repeated sampling under the model 


(6,19), 
E(2)=B Ù lx; : Vi) =A Se (6.22) 


From (6.20) and (6.22), Ý is clearly model-unbiased if © lx; =X. Minimizing 
V(¥) under this condition by a Lagrange multiplier gives 

21x; =cx;: lı =constant = X/n¥ (6.23) 
The constant must have the value X/nz in order to satisfy the model-unbiased 


condition / x, =X. Hence the BLUE estimator Y is nyX/nx = X¥/% = Yr, the 
usual ratio estimator. This completes the proof. 
Furthermore, from (6.20) and (6.21), with / = X/nz, 


YeuVieh lame = (yap OE DA (6.24) 
hin N=n 
EAE ES ae (6.25) 


N-n 
where $} denotes the sum over the (N— n) population values that are not in the 
sample, Hence 


5 A(X ~n) (në —nx 
Vi Yq) = ee (ange ee (6.26) 
(nx) nx d 
A model-unbiased estimator of A from this sample is easily shown to be 
rye tes ak 
A =Z = (y= Rx)?/(n—-1) (6.27) 


where R =9/F, as usual. This value may be substituted in (6.26) to give a 
model-unbiased sample estimate of V(Yp). 

The practical relevance of these results is that they suggest the conditions under 
which the ratio estimator is superior not only to f but is the best of a whole class of 
estimators, When we are trying to decide what kind of estimate to use, a graph in 
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which the sample values of y; are plotted against those of x; is helpful. If this graph 
shows a straight line relation passing through the origin and if the variance of the 
points y; about the line seems roughly proportional to x; the ratio estimator will be 
hard to beat. í 

Sometimes the variance of the y; in arrays in which x; is fixed is not proportional 
to x;. If this residual variance is of the form Av(x;), where v(x;) is known, Brewer 
and Royall showed that the BLUE estimator becomes 


Y= y wya (6.28) 
Lwa 


where w; = 1/v(x;). In a population sample of Greece, Jessen et al. (1947) judged 
that the residual variance increased roughly as x;. This suggests a weighted 
regression with w; = 1/x?, which gives 


x x(5 wya) A 
raaa =a (6.29) 


For a given population and given n, V( Yr) in (6.26) is clearly minimized, given 
every x; >0, when the sample consists of the n largest x; in the population. In 16 
small natural populations of the type to which ratio estimates have been applied, 
Royall (1970) found for samples having n = 2 to 12 that selection of the n largest 
x; usually increased the accuracy of Yr. à 

In summary, the Brewer-Royall results show that the assumption of a certain 
type of model leads to an unbiased ratio estimator and formulas for V( Yr) and 
v(Yr) that are simple and exact for any n>1. The results might be used in 
practice in cases where examination of the y, x pairs from the available data 
suggests that the model is reasonably correct. The variance formulas (6.26) and 
(6.27) appear to be sensitive to inaccuracy in the model, although this issue needs 
further study. 

Further work by Royall and Herson (1973) discusses the type of sample 
distribution needed with respect to the x; in order that Yz remains unbiased when 
there is a polynomial regression of y; on x; 


6.8 BIAS OF THE RATIO ESTIMATE 


In general, the ratio estimate has a bias of order 1/n. Since the s.e. of the 
estimate is of order 1/Vn, the quantity (bias/s.e.) is also of order 1/vn and 
becomes negligible as n becomes large. In practice, this quantity 1S, usually 
unimportant in samples of moderate size. Its value in small samples is of interest, 
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however, in stratified sampling with many strata, where we may wish to:;compute 
and examine ratio estimates in individual strata with small samples in the strata. 
Two useful results about the bias are presented. 

The first gives the leading term in the bias when it is expanded in a Taylor’s 
series. 


RoR = oR = 
x x 
Write 
(6.30) 

Hence, 

AL ea R 

R-R 7 T x (6.31) 
Now 


E(y—Ri)=Y-RX=0 


so that the leading term in the bias comes from the second term inside the 
brackets. Furthermore, 


Ey(#-X) =E(—- Ye-X) =o, (6.32) 
by theorem 2.3 (p. 25) and the definition of p. Also, 
Ei(g-X)=E(z-X) -11s: 
Hence the leading term in the bias is 


E(R-R)= “Firs? 05,5.) (6.33) 


=+L6,-GR (6.34) 


For a rigorous justification of (6.34), see David and Sukhatme (1974). 
Now the leading term in V(R) is 


PGS 
vR) =P (52-+R*5,2-2RpS,5:) (6.35) 


from (6.5), substituting R = Yr/NX. 
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From (6,33) and (6.35) the leading term in the quantity (bias/s.e.), which is the 
same for R, Yp, and Yp, may be expressed as 


(bias) _ 2 (2) (RS. —pS,) 
s.e. (R?S—2RpS,S, +S,” 


where cv (x)= V1—fS,/VnX. By substituting sample estimates of the terms in 
(6.36), Kish, Namboodiri, and Pillai (1962) computed the (bias/s.e.) values for 
numerous items in various national and more localized studies. In the national 
studies nearly all the (bias/s.e.) values were <0.03 and almost the only values 
>0.10 in their studies occurred for a single stratum with n, = 6 small hospitals. 

The second result, due to Hartley and Ross (1954), gives an exact result for the 
bias and an upper bound to the ratio of the bias to the standard error. Consider the 
covariance, in simple random samples of size n, of the quantities R and x. We have 


(6.36) 


cov (R, 8) =E(22)-£(8) EG) (6.37) 
Y—XE(R) (6.38) 
Hence 
ye hal Heaney Ramey b= 
ER) = p RCo (R, 4) =R = cov (R, ¥) (6.39) 


‘Thus the bias in R is —cov (R, ¥)/X. Unlike the Taylor approximation (6.33) to 
the bias, this expression is exact. 
Furthermore, 


os 6) eR 20R orl 
|bias in R| = x 


TROTZ 
since R and 7 cannot have a correlation >1. Hence 
|biasin Å| _ oz 
— <š =cvof š 40 
oR X a (6.40) 


The same bound applies, of course, to the bias in Y and Yn. Thus, if the cv of x is 
less than 0.1,-the bias may safely be regarded as negligible in relation to the s.e. 


6.9 ACCURACY OF THE FORMULAS FOR THE 
VARIANCE AND ESTIMATED VARIANCE 


With small samples, say n <30 and C,, large, it has long been suspected that the 
large-sample formulas given for V(R) and v(R) are underestimates. By a Taylor 
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series expansion, Sukhatme (1954) expressed the error in V(R) in terms of the 
bivariate moments of y and x. Unfortunately, the result is too complicated to lead 
to a useful guide for practical applications. 

If y and x follow a bivariate normal distribution, Sukhatme’s result simplifies 
considerably. Let 


Vi =(Gy +G.—2G,,)/n 


denote the first approximation to the relative variance of R, ignoring the fpc. To 
terms of order 1/n?, 
R-R 3Gw | OCRE CA TENEI 


2 
——] = FNS i 
BAS) =v. (17+ Corer os OE 


Since the right-hand term inside the parentheses is less than 6C,,/n, this gives 


Bey = v(t 6) (6.42) 


to terms of order 1/n?. Now G,,/n is the square of the coefficient of variation of 7. 
Thus, if n is large enough so that the cv of ¥ is less, than 0.1, use of V, should not 
underestimate by more than 9%, In practice, the multiplier 9 in (6.42) appears to 
be unduly high as compared with (6.41). For instance, if C.. = C,y, (6.41) reduces 
to 


e(*4) Zv, [ 1 +&(6~3p)| (6.43) 
Since p is almost always positive in applications of the ratio method, a multiplier 
between 3 and 6 is more representative. However, the effects of nonnormality in y 
and x also enter into the term of order 1/n?. 

From a Monte Carlo study by Rao (1968) of small natural populations, some 
illustrative results on the biases of the large-sample formulas for V(R) and v,(R) 
will be quoted for simple random samples from eight populations each with 
N>30. The populations are described in Rao ( 1969). The formulas for V(R) and 
v1(R), given in (6.4) and (6.12), will be appraised as estimators of the true 
MSE(R). 

For a given population the quantities 100[MSE(R)— V(R)]/MSE(R) and 
100[MSE(R )-Ev,(R)]/MSE(R) are the percent underestimates of the true 
MSE(R). The averages of these percents in the eight populations were as shownin 
Table 6.2. 

In these data the percent underestimation in V(R) scarcely declines at all with 
increasing n. This is explained in part by the circumstance that in one population 
V(R) gave an overestimate which declined with n. The indications from these 
results are that the biases in v;(R) are much more serious in small samples than 
those in R itself, and are unsatisfactory at least up to n =12. For n =4, Koop 


164 SAMPLING TECHNIQUES 


TABLE 6.2 
Average percent 
underestimation of MSE(R) 
n 
Estimator 4 6 8 12 
V(R) in (6.4) 14 14 14 12 
v,(R) in (6.12) 31 23 21 18 


(1968) found underestimations in v,(R) averaging 25% in three populations with 


N=20. 
Analternative estimator to v,(R) that looks promising as regards bias reduction 
is presented in section 6.17. 


6.10 RATIO ESTIMATES IN STRATIFIED RANDOM 
SAMPLING 


There are two ways in which a ratio estimate of the population total Y can be 
made. One is to make a separate ratio estimate of the total of each stratum and add 
these totals. If ya, x, are the sample totals in the Ath stratum and X; is the stratum 
total of the xp; this estimate vee (s for separate) is 


es =L X, (6.44) 


No assumption is made that the true ratio remains constant from stratum to 
stratum. The estimate requires a knowledge of the separate totals X,,. 


Theorem 6.4. If an independent simple random sample is drawn in each 
stratum and sample sizes are large in all strata, 


V( Yee) = = Net C fa) 


(S HRK Se? — ZRK PhSyhSzn) (6.45) 
where Rn, = Y),/X;, is the true ratio in stratum A, and p, is defined as before in each 
stratum. ’ 
Proof. Apply formula (6.2), section 6.3, for a simple random sample to give in 
stratum h, 
Ne- fa) 
Ah 


V(Yen) = (Sy? + Ri? Sen? — 2 pp SypSen) (6.46) 


Since View =), Yor and sampling is independent in each stratum, V( Yrs)= 


h 
T V( Lex) and the result (6.45) follows. 
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This formula is valid only if the sample in each stratum is large enough so that 
the approximate variance formula applies to each stratum. This limitation should 


. be noted in practical applications. 


Moreover, when the n, are small and the number of strata L is large, the bias in 
Ýr, may not be negligible in relation to its standard error, as the following crude 
argument suggests. 

In a single stratum we have seen (section 6.8) that 

bias in r| 
a( Fra) 
If the bias has the same sign in all strata, as may happen, the bias in Yrs will be 
roughly L times that in Yg,. But the standard error of Ýr, is only of the order of 
VL times that of Ver: Hence the ratio 


bias in Yz.| 
o( Yrs) 


= cvof x, 


is of order 


VL(cv of $a) 


For example, with 50 strata and the cv of X, about 0.1 in each stratum, the bias 
in Ýr, might be as large as 0.7 times its standard error. The contribution of the 
bias to the mean square error of Yrs would then be about one third. 

Although in practice the bias is usually much smaller than its upper bound, the 
danger of bias with the separate ratio estimate should be kept in mind if 
VL(cv of %,) exceeds, say, 0.3. 


6.11 THE COMBINED RATIO ESTIMATE 


An alternative estimate is derived from a single combined ratio (Hansen, 
Hurwitz, and Gurney, 1946). From the sample data we compute 


Ŷ,= E Navin ses Ma (6.47) 


These are the standard estimates of the population totals Y and X, respectively, 
made froma stratified sample. The combined ratio estimate, Yp. (c for combined) 
is 


A 


s Yu Yet 
Yr =a X =X 
Rene Kae (6.48) 


where j,, = Y,/N, ¥,,=X,,/N are the estimated population means from a 
Stratified sample. 
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‘The estimate Yre does not require a knowledge of the X;,, but only of X. 

The combined estimate is much less subject to the risk of bias than the separate 
estimate. Using the approach of Hartley and Ross in section 6.8, we have, writing 
R= stl Zso 


cov (Ry, 2a) = E(u) -ERE Cn) 


= Y-XE(R.) (6.49) 
Hence 
E(R.)=R = Z00v (Re, Fu) 
and 
[bias in Rd in Rel = [Presu A <cv of Xz. (6.50) 
oR. X 


Thus the biases in R,, Ýr are negligible relative to their standard errors, proyided 
only that the cv of Z, is less than 0.1. 


Theorem 6.5. If the total sample size n is large, 
Nell- fa) 


A (S +R? ch —2RPrSynSn) (6.51) 
h 


V(Yrre)=¥ 
Proof. This follows the same argument as theorem 2.5, In the present case the . 
key equation is 


Pre- ¥) = Fy, Rew) =N Gu RE) (6.52) 


Now consider the variate Un: = Yn — Rxni. The right side of (6.52) is Ni, where tis, 
is the weighted mean of the variate up; in a stratified sample. Furthermore, the 
population mean Ü of un is zero, since R = Y/X. 

Hence we may apply to @,, theorem 5.3 for the variance of the estimated mean 
from a stratified random sample. This gives 


Ni (Nn = Mn) 


V( Pec) =N?V isn) =} Sax? (6.53) 
h Nh 
where 
wN = 
2i Aj 2 
Sun = Nei a (Uni — Un) 


Ny A 2 
3 rest [Oni = Yn) R (xni -X,)P 


When the quadratic is expanded, result (6.51) is obtained. 
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i From equations (6.45) and (6.51) it is interesting to note that the approximate 
variances of Yr, and Yr- assume the same general form, the difference being that 
the population ratios R in the individual strata in (6.45) are all replaced by R in 
(6.51). 


6.12 COMPARISON OF THE COMBINED AND 
SEPARATE ESTIMATES 
We may write 

V(Yrc)— V( Êr) 
Ne- fa) 
h Nh 


Ni (1—fn) 
1, 


h 


[(R = Ri) Sen -2(R- Rp )PnSyn Sen] 


ay 


[(R = Rp) Sex +2(Rp — R)OnSy Sen — Ri Sen) ] 


In situations in which the ratio estimate is appropriate the last term on the right 
is usually small. (It vanishes if within each stratum the relation between y»; and xp; 
is a straight line through the origin.) Thus, unless R} is constant from stratum to 
stratum, the use of a separate ratio estinate in each stratum is likely to be more 
precise if the sample in each stratum is large enough so that the approximate 
formula for V(Y,) is valid, and the cumulative bias that can affect Yr, (section 
6.10) is negligible. With only a small sample in each stratum, the combined 
estimate is to be recommended unless there is good empirical evidence to the 
| contrary. : 
For sample estimates of these variances we substitute sample estimates of R, 
and R in the appropriate places. The sample mean squares sy,’ and sx} are 
substituted for the corresponding variances and the sample covariance for the 
term p;S,,S<n. The sample mean square and covariance must be calculated 
separately. for each stratum. 


| Example. The data come from a census of all farms in Jefferson County, Iowa. In this 

| example Ya represents acres in corn and x,, acres in the farra. The population is divided 
into two strata, the first stratum containing farms of as many as 160 acres. We assume a 
sample of 100 farms. When stratified sampling is used, we will suppose that 70 farms are 
taken from stratum 1 and 30 from stratum 2, this being roughly the optimum.allocation. - 

| The data are given in Table 6.3. The last three quantities, Q,, Vn’, and V,,", are auxiliary 

| quantities to be used in the computations, the last two being defined later. 

We consider five methods of estimating the population mean corn acres per farm. The fpc 
are ignored. 


1. Simple random sample: mean per farm estimate. 
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TABLE 6.3 
DATA FROM JEFFERSON COUNTY, Iowa 
Size 

Strata (farm acres) Ny Sy? Syn Sat Ry 
1 0-160 1580 312 494 2055 0.2350 
2 More than 160 430 922 858 7357 0.2109 
For complete pop. 2010 620 1453 7619 0.2242 
Strata A x, Mm Q=W2lm Vr’ Vi" 

1 19.40 82.56 70 0.008828 193 194 

2 51.63 244.85 30 0.001525 887 907 


For complete pop. 26.30 117.28 100 - 


2. Simple random sample: ratic estimate. 
1 
V2 = 7 (Se +R?S?—2RS,,) 


= 0+0. 2242)°(7619)—2(0.2242)(1453)] 


=3.51 
3. Stratified random sample: mean per farm estimate. 


We 
V3;=r—*S, 


Sm =E QS = 4.16 

4. Stratified random sample: ratio estimate using a separate ratio in each stratum. 
Va =E Q, (Sy? HRe Sen —2RpSyn) =L Qr Vp’ = 3.06 

5. Stratified random sampling: ratio estimate using a combined ratio. 
Vs = EQ, (Sy? +R? S —2RS,n) =F Q, V,” =3.10 


The relative precisions of the various methods can be summarized as follows. 


Method of Relative 

Sampling method Estimation Precision 
1. Simple random Mean per farm 100 
2. Simple random Ratio 177 
3, Stratified random Mean per farm 149 
4. Stratified random Separate ratio 203 


5. Stratified random Combined ratio 200 


Lee 
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The results bring out an interesting point of wide application. Stratification by size of 
farm accomplishes the same general purpose as a ratio estimate in which the denominator is 
farm size. Both devices diminish the effect of variations in farm size on the sampling error of 
the estimated mean corn acres per farm. For instance, the gain in precision from a ratio 
estimate is 77% when simple random sampling is used, but it is only 36% (203 against 149) 
when stratified sampling is used. 

In the design of surveys there may be a choice between introducing a factor into ‘the 
stratification or utilizing it in the method of estimation. The best decision depends on the 
circumstances. Relevant points are: (a) some factors, for example, geographical location, 
are more easily introduced into the stratification than into the method of estimation; (b) the 
issue depends on the nature of the relation between y, and x,. All simple methods of 
estimation work most effectively with a linear relation. With a complex or discontinuous 
relation, stratification may be more effective, since, if there are enough strata, stratification 
will eliminate the effects of almost any kind of relation between y; and x; (c) If some 
important variates are roughly proportional to x, but others are roughly proportional to 
another variate z, it is better to use x; and z; as denominators in ratio estimates than to 
stratify by one of them. 


6.13 SHORT-CUT COMPUTATION OF 
THE ESTIMATED VARIANCE 


If n; =2inall strata, Keyfitz (1957) has given simple methods for computing the 
approximations to the estimated variances of Yrc or Re or, more generally, of 
functions of one or more variables of the form A For Re we have 


N, 
MEY x Yn Zg Om+yn2) 
R= si (6.54) 
Xa LX, Nn 
h Log Gm na) 


The Keyfitz method uses the identity that form, =2, 
252= 25 (Yu =I) On Yaa)? = (dyn? (6.55) 
where dyn = (Yni — Yn2). Hence 
o( $4) = (M4) 2a -fsp 0 -fon mahd (6.56) 
where yii = NaYni/2. Similarly, for the sample estimate of the covariance, 
cov (P, 8p) = (1- fi) (dyn')(dxn') (6.57) 


Now 


(6.58) 
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Since sampling is independent in different strata, 
5 a Di (2x) (2) (=) 
HAST 6.59 
o{Re) ) za n ENGE ARAINA (6:59) 


= (Z) =a =A en (6.60) 


Keyfitz (1957) has extended this method to cover poststratified estimators and 
multistage sampling, and to give variances of differences of estimates from 
successive surveys in periodic samples. Woodruff (1971) gives a general approach 
that handles nonlinear estimators, unequal probabilities of selection, and samples 
of size m, in the strata. As an illustration of Woodruff’s approach, consider a 


function fí (¥) where XY represents the vector or set of m variables Ý, => Yin With 
h 


simple random sampling in stratum A, the Y, are of the form (N,/m,) È Yimin the 
sum extending over the np TF units in stratum h. a Taylor’s approximation, 


f- f(Y)= Der Fi Y= Dine yon Yin) (6.61) 
The trick is to reverse the order of summation, writing 
ð A 
f(®)- f= L23 Ën ~ ¥in)=E (On Uh) (6.62) 
where 
af e Na fo af 
Oi “Loy ðY; Yn I 7 Ce ay; nu) = 3 Uni (6.63) 


By evaluating the (af/aY;) at the estimates Y;, the un can be calculated from (6.63) 
for each sample unit in stratum h. With simple random sampling in the strata, the 
usual formula for the estimated variance of the sum Û, applies. Hence, from 
(6.62), an approximate sample estimate of the variance of f(¥) is 


(Mim È out)? 


vif] = È aD (6.64) 


The advantage of this approach is that the covariances of the Y;, need not be 
calculated. i Lim gy 

Toillustrate the u for the estimator R. = Y/X, ithelps to write R. = Y1/ Ý», so 
that 


SE ON If o AXi 


Ya 
OR e r ee (6.65) 
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Hence, from (6.63), with n, = 2, 


P ae x NE Nx Yini — Rani) 
) 
Uhi = f, Ys 2Y2h a A (6.66) 
In terms of the original Yhi, Xhi, 
N; di R: i 
Dre O een) (6.67) 


na X 


Its estimated variance is approximated as in (6.64). 

The Keyfitz method is used, for instance, in the Health Interview Survey of the 
National Center for Health Statistics. The sampling unit is a cluster unit—a county 
or group of contiguous counties. Each unit is divided into segments of about 9 
households, an average of 13 segments being chosen from a sampled unit. The 
variables y;,1, Ya2 might be the numbers of persons with a specific illness and x, 
X,2 the total numbers of persons in the samples from the two units in stratum 4. 

In addition to the initial geographic stratification of the counties, the persons in 
the era are poststratified by age, sex, and color. Thus, instead of the estimator 

Pre = XR, of the total number ill, the estimator is 


Fis = DX RIAD Xe (6.68) 
a a, Xa 

where a represents an age-sex-color class and X, is the known total population in 
this class. Here we have a function of two sets of random variables — Ÿ, and Si, 
Furthermore, for many reasons, Yan; and Ya'ni for two different classes a and a'in a 
cluster unit may be correlated; for example with an infectious disease the number 
of cases may be high for all classes in the unit. Applying (6.63) and ignoring the 
fpc’s, we have with n, = 2, 


v(Pps)= =3[r4,(9 -*e\)! (6.69) 
srlex( Bee 


where dyan = Ni(Yar1—Yan2)/2. 

In this application the Keyfitz method also handles further complexities of the 
survey—selection of primary units with unequal probabilities, adjustments for 
nonresponse, and use of the method of collapsed strata. Furthermore, since 
computer time even with this simple method permits variance calculations for 
only a limited number of items, charts of the relation between s. d. (Y)/ Yand Y 
are given for different types of item’to help in predicting s.d. (¥) for items for 
which the s.d. has not been computed. Bean (1970) gives a clear presentation of 
these methods and results. 
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6.14 OPTIMUM ALLOCATION WITH A RATIO ESTIMATE 


The optimum allocation of the n, may be different with a ratio estimate than 
with a mean per unit. Consider first the variate Ygs. From theorem 6.4 its variance 
is 


V Pr) =P ANAM (5,24 RSRS Soh) 
h h 
— Ny 
nym 2 with sy2=—t— dy? (6.71) 
h Nh Na- liz 


where dpi = Yni — RpXni is the deviation of yp: from R,x,;. By the methods given in 
Chapter 5 for finding optimum allocation, it follows that (6.71) is minimized 
subject to a total cost of the form È c,,, when 


NpSan 
ny C—=— 
h A 


With a mean per unit it will be recalled that for minimum variance n, is chosen 
proportional to NpSyn/ Vcn: 

In the planning of a sample, the allocation with a ratio estimate may appear 
a little perplexing, because it seems difficult to speculate about likely values of 
San: Two rules are helpful. With a population in which the ratio estimate is a best 
linear unbiased estimate, San will be roughly proportional to VX, (by theorem 
6.3). In this case the n, should be proportional to NiVXi/ Ven. Sometimes the 
variance of d,; may be more nearly proportional to .X,”. This leads to the 
allocation of n, proportional to N,X;/V¢}, that is, to the stratum total of xpi, 
divided by the square root of the cost per unit. An example of this type is discussed 
by Hansen, Hurwitz, and Gurney (1946) for a sample designed to estimate sales of 
retail stores. A 

If the estimate Ýr. is to be used, the same general argument applies. 


Example. The different methods of allocation can be compared from data collected ina 
complete enumeration of 257 commercial peach orchards in North Carolina in June 1946 
(Finkner, 1950). The purpose was to determine the most efficient sampling procedure for 
estimating commercial peach production in this area, Information was obtained on the 
number of peach trees and the estimated total peach production in each orchard. The high 
correlation between these two variables suggested the use of a ratio estimate, One very 
large orchard was omitted. s 

For this illustration, the area is divided geographically into three strata, The number of 
peach trees in an orchard is denoted by x», and the estimated production in bushels of 
peaches by y,,, Only the first ratio estimate Ýr. (based ona Separate ratio in each stratum) 
will be considered, since the principle is the same for both types of stratified ratio estimate, 

Four methods of allocation are compared; La) n, proportional to N,, (b) n, pro- 
portional to N,S,,, (c) n, proportional to NYX, and (d) n, proportional to N,X, = X;,. 
The sample size is 100. The data for these comparisons are summarized in Table 6.4. 
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TABLE 6.4 
DATA FROM THE NORTH CAROLINA PEACH SURVEY 
Strata Sa? Syn TS" Sa Syr x, Y, Ry Sa? 
1 5186 6462 8699 72.01 93.27 53.80 69.48 1.29133 658 
2 2367 3100 4614 48.65 67.93 31.07 43.64 1.40475 573 
3 4877 4817 7311 69.83 85.51 56.97 66.39 1.16547 2706 
Pop. 3898 4434 6409 62.43 80.06 44.45 56.47 1.27053 1433 


Strata M (@ NSn O Ya MVE © Mh @ 


1 47 18 4384 22 7.33 344.5 20 2529 22 
2 118 46 8016 40 5.57 657.3 39 3666 32 
3 91 36 0.7781 38 7.55 687.1 41 5184 46 
Pop. 256 100 20181 100 20.45 1688.9 100 11379 100 


The upper part of the table shows the basic data. The method employed to calculate the 
four variances was first to find the n, for each type of allocation. These values are shown in 
the columns headed (a) through (d) in the lower part of the table. Thus, with allocation (a), 
n, =nN,/N, so that in the first stratum 


_ (100)(47) _ 
DAA 


When the n, have been obtained, the corresponding V(¥g,) is found by substituting in 
the formula 


1 


V Fae) = NaN tn) 


h na 


Sai? 
where 
Su? = Spè + RE Sa = 2R Span 
The quantities Sa, are given on the extreme right of the top half of Table 6.4. 


TABLE 6.5 
COMPARISON OF FouR METHODS OF ALLOCATION 
Variance 
Method of 
Allocation: ny Strata 
Proportional Relative 
to 1 2 3 Total Precision 
1. Ny 49,824 105,833 376,215 531,872 100 
2: NySyr_ 35,144 131,847 343,446 510,437 104 
3. NKY Žr 41,750 136,964 300,312 479,026 111 
4. N,X, 35,144 181,710 240,888 457,142 116 


$e 
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The variances and relative precisions are shown in Table 6.5. i 
There is not much to choose among the different allocations, as would be expected, since 
the n, do not differ greatly in the four methods. Method 4, in which allocation is 


proportional to the total number of peach trees in the stratum, appears a trifle superior to 
the others. 


6.15 UNBIASED RATIO-TYPE ESTIMATES 


As we have noted, estimates of the ratio type that are unbiased or subject to a 
smaller bias than R or Yat may be useful in surveys with many strata and small 
samples in each stratum if the separate ratio estimate seems appropriate. Three 
methods that give unbiased estimates and three methods that remove the term of 
order 1/n in the bias [see (6.33)] will be discussed briefly. 

In comparing these methods, relevant questions are: (a) Does the MSE of the 
method compare favorably with that of the ordinary ratio estimate? (b) Does the 
method provide a satisfactory sample estimate of variance? This is a difficulty with 


R, as we have seen. We first describe the methods. The unbiased methods require 
knowledge of X. 5 


Unbiased Methods 


One estimate, due to Hartley and Ross (1954), can be derived by starting with 
the mean F of the ratios y;/x; and cee it for bias. 


anal Ula sY 
Nn ohh aa ~ x; 
Now 
SÈ ri(x= D-i tas (we, n)X 
= Y-XE(r,) = X[R-E(r,)] (6.72) 
But in simple random sampling E(F)=E(r;). Hence 
ee 14 = 
bias in? =E(7)—R= -EN 2, r(x- X) (6.73) 
By theorem 2.3, an unbiased sample estimate of 
TIN z3 
Nei x na =X) 


is 
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On substituting into (6.73), the estimate 7, corrected for bias, becomes 
——-(¥ —7£) (6.74) 


The corresponding unbiased estimate of the population total ¥ is 


n(N-1) 


RupX =7X+ 
n-1 


F-ra) (6.75) 


By similar arguments, another unbiased estimate (Mickey, 1959) is derived 
from the n ratios R; obtained by removing each unit in turn from the sample, so 
that R, = J y/¥,x over the remaining (n — 1) members. If R_ denotes the mean of 
the R;, Mickey's estimate is 


Ru =R-+————_(y -R-2). (6.76) 


Asa third method, Lahiri (1951) showed that the ordinary ratio estimate Ris 
unbiased if the sample is drawn with probability proportional to È x; Perhaps the 
simplest method of doing this (Midzuno, 195 1) is to draw the first member of the 
sample with probability proportional to x;. The remaining (n — 1) members of the 
sample are drawn with equal probability. It is easy to prove (exercise 6.10) that 
with this method the probability that a specific sample is drawn is proportional to 


iq gta ttn 
¥ x, and that R =¥ y,/ X x; is unbiased for this method of sample selection. 


Methods with bias of order 1/n” 

These methods consist of an adjustment to R. The first, due to Quenouille 
(1956), is applicable to a broad class of statistical problems in which the proposed 
estimate has a bias of order 1/n. It has been given the name of the jackknife 
method, to denote a tool with many uses. The utility of this method for ratio 
estimates was pointed out by Durbin (1959). 

Ignoring the fpc for the moment, the bias of estimates like R may be expanded 
in a series of the form 

bı , bz 


B(R)=R+ 44+: it (6.77) 


If n = mg, let the sample be divided at random into g groups of size m. From (6.77) 
BR) gR tee (6.78) 
m gm 


Now let Ĝĝ; be the ordinary ratio X y/} x, computed from the sample after 
omitting the jth group. Since R; is obtained from a simple random sample of size 


176 SAMPLING TECHNIQUES 


m(g—1), we have 


AERA ey ee 6.79 
Gl esac ike 
Hence 
4 bı bz 
Elle -1)R;]=(g-1)R +=+ (6.80) 


— +.. 
m (g—-1)m?° 
Subtraction from (6.78) gives, to order n72, 
bz b2 8g 
= RN 
(g—1)m? n? (g—1) 
The bias is now of order 1/n*. We can construct g estimates of this type, one for 


each group. Quenouille’s estimator (the jackknife) is the average of these g 
estimates, that is, 


ElgR =(g—-1)R,]=R Sie 


Ro=eR-(g-1)R_ - (681) 
where R_ is the average of the g quantities R;. As Quenouille showed, the 
variance of Ro differs from that of R by terms of order 1/77. Any increase in 
variance due to this adjustment for bias should therefore be negligible in moder- 
ately large samples. The choice m = 1, g = n seems best with the jackknife in small 
samples. v2 

If the fpc cannot be ignored, the leading term in the bias of R, as in (6.33), is of 
the form 5,(1—f)/n. It can be shown (exercise 6.10) that in order to remove both 
the terms in 1/n and 1/N, we need 


Ro=wR-(w-1)R_ (6.82) 


where w = g[1—(n—m)/N], or with m =1, g=n, w=n[1—(n—1)/N]. 
Beale’s (1962) estimator is 


p, -LtlA-A/nilsy./%) _ 1+ 10-f)/n Jey. 


P e+(0—-f/nis.7/%)  T+[0—f)/nle., (6.83) 
while Tin’s (1965) is the closely related quantity 
e= K-EN (E _s)] _ gf, _G-A 
Rr=R[1- i (3 2)]=a[1- z Cacen] (6.84) 


where sys =% (Ix -3)/(n-1), sê =È (x, =7)*/(n—1), so that Cys Cie are 
the sample relative covariance and relative variance of x. 
The structure of Rr may be seen by noting that from (6.34) the leading term in 
the expected value of R may be written 
f: — 
rfi De, Te] 
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Evidently, Rr is adjusting R by a sample estimate of the needed bias correction. 
Rpg and Rr are identical to terms of order 1/n and in general perform very 
similarly. 


6.16 COMPARISON OF THE METHODS 


Example. The artificial population in Table 6.6 contains three strata with N, =4, n, = 
2 in each stratum. The population was deliberately constructed so that (a) R, varies 
markedly from stratum to stratum, thus favoring a separate ratio estimate Yp,, and (b) R, 
overestimates in each stratum, with the threat of a serious cumulative bias in Yp,. The 


TABLE 6.6 
A SMALL ARTIFICIAL POPULATION 
Stratum 
I 1I II 
y x y x y x 
2 2 2 1 3 1 
3 4 5 4 7 3 
Aae 97 8 9 4 
11 20 2423 25am? \ 
St tata hd ri A Sey AAA ene ii 
Totals 20 32 40 36 44 20 
Rn 0.625 1.111 2.200 


following methods of estimating the population rotal Y were compared. 
Simple expansion: Y N,¥;, 
Combined ratio: (¥,,/%.)X 
Separate ratio: Y (y,/%,)X, =D RX, 


The remaining estimates, the separate Hartley-Ross, Lahiri, and Quenouille, Beale, and 
Tin methods, have the same form as the separate ratio estimate, except that Rie (Lik 
and so on, replace R,. (For n, =2, the Hartley-Ross and Mickey methods are identical.) 
There are 6? = 216 possible samples. Results are exact apart from rounding errors. For help 
in some computations I am indebted to Dr. Joseph Sedransk. 

The results in Table 6.7 show some interesting features, For the combined ratio estimate, 
the contribution of the (bias)? to the mean square error is trivial, despite the extreme 
conditions, but this estimate does poorly as regards variance because of the wide variation 
in the R,. As judged by the MSE, the separate ratio estimate is much more accurate than 
the combined estimate, but it is badly biased. Of the unbiased methods, Hartley-Ross 
shows relatively high variance, as it has been found to do for n, =2 in some studies on 
natural populations. The Lahiri method does particularly well. This population suits the 
Lahiri method because one unit in each stratum has unusually high values of both y; and x; 
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TABLE 6.7 
RESULTS FOR DIFFERENT ESTIMATES OF Y 
Method Variance (Bias)? MSE 
Simple expansion 820.3 0.0 820.3 
Combined ratio 262.8 6.5 269.3 
Separate ratio. 35.9 24.1 60.0 
Separate Hartley-Ross 153.6 0.0 153.6 
Separate Lahiri 19.6 0.0 19.6 
Separate Quenouille 42.9 1.1 44.0 
Separate Beale 28.9 8.0 36.9 
Separate Tin 28.6 SA 34.3 


with the consequences that this un:t has a high probability of being drawn and that samples 
containing this unit give good estimates of R,. ~ m 

The Quenouille, Beale, and Tin methods all produced substantial decreases in bias as 
compared with the separate ratio estimate, and all had smaller MSE’s, so that in this 
ex ‘mple they achieved their principal purposes. 

The study by J. N. K. Rao (1969) of natural populations cited in section 6.9 
compared the Quenouille, Beale, and Tin methods for n =2, 4, 6, 8, on 15 such 
populations. For n = 2, the most severe test, the medians and the upper quartiles 
of the quantities | bias|//MSE were as follows: Ro, 3%, 7%; Rp, 8%, 12%; Rr, 
8%, 19%; as against 15%, 20% for R. The more complex methods appear to help 
materially as regards bias in these tiny samples. 

The same study compared the MSE’s of five of the estimates in this section with 
that of R, (Lahiri’s method was omitted, since the study was confined to simple 


» fandom samples.) For each method the ratio 100 MSE(Ro)/MSE(R), and so 
forth, was calculated for each population, 


For n =4, Quenouille’s and Mickey’s estimates were slightly inferior to R in’ 


these populations but, for n =6, all methods had average MSE’s very close to 
those of R. For a very small sample from a single Population, this study suggests 
that these more complex methods have no material advantage in accuracy over R. 

Or no increase in MSE in a single 


But the fact that they reduce bias with little 
stratum should give them an advantage ina separate ratio estimate with numerous 


strata having small samples. 
Under a linear regression model, comparisons of the MSE’s of these methods 
for small n by P.S.R.S. Rao and J. N. K. Rao (1971), Hutchinson (1971), and J. N. 


K. Rao and Kuzik (1974) gave results in general agreeing with those from the 
natural populations. 


6.17 IMPROVED ESTIMATION OF VARIANCE 


One estimator worth consideration for mod 


erate or small samples was 
suggested by Tukey (1958) for the jackknife (Que 


nouille’s) estimate Ro. With g 
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groups, n = mg, and f negligible, Ro is the average of the g quantities ' 
=gŘR-(g-1)Å; 
where R =) y/¥ x, computed after omitting group j. If the Ri could be regarded 


as g independent estimates of R then, with simple random sampling, an unbiased 
estimate of V(Rq) would be 


(=f S(Rj- Ro)? 


v(Ro)= eI (6.85) 
Since Ri- Ro =-(g- 1)(R; -R_), (6.85) is more easily calculated as 
okoz PEF RR (6.86) 


where Ri is the mean of the g quantities Ê, 

The R} or R; for different j are, of course, not independent, and formula (6.86) 
is an approximation. So far, the analytical properties of v(Ro) have been 
established only for large samples. Arvesen (1969) showed that for a broad 
class of estimates including Ro that are symmetrical in the elements of the sa: 
[estimates known as Hocffding’s U-Statistics (1948)], the formula Bee 
becomes unbiased either for fixed g or for g=n as n becomes large. 

From Rao’s (1969) study of eight natural populations, small-sample average 
percent underestimations in the standard vi(R)' as an estimate of the true 
MSE(R) were reported in section 6.9 for n=4, 6, 8, 12. The corresponding 
average percent biases in v (Ro) are shown for comparison in Table 6.8; these are 
the averages of the eight numbers: 100[v (Ro)- MSE(Ro)]/MSE(Ro). i 

Table 6.8 also gives the averages of the quantities 100 [v(Ro)— 
MSE(R)]/MSE(R). These averages are of interest to the investigator who 
uses R but is willing to replace v (R) by v(Ro) as an estimator of V(R) if it seems 
less biased. In view of the biacesinile and Ro, the comparisons of V(Ro) are made 
with the MSE’s as more appropriate. 


TABLE 6.8 
AVERAGE PERCENT BIAS IN ESTIMATORS OF VARIANCE 
n= 
Average of 4 6 8 12 


100[v(Ro)— MSE(R,)]/MSE(Ro) +11% +10% +6% +1% 
100[v(Ro)- MSE(R)]/MSE(R) +11% +10% +6% +1% 
100fo,(R)- MSE(R)]/MSE(R) -31% -23% —21% —18% 


In these populations v(Ro) is a slight overestimate of both MSE(Ro) and 
MSE(R), while v;(. R) has substantial negative biases in these small samples. 
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The stability of v(Ro) relative to that of v,(R), as judged by the squares of the 
coefficients of variation of these variance estimates; was poor in these samples. In 
studies by Rao and Beegle (1968) of v(Ro) and v,(R) under a linear regression 
model of y on x in an infinite population with x normal, however, v(Ro) and 
v,(R) appeared about equally stable for n =4 to n = 12. 
` With a separate ratio estimate and numerous strata these results suggest that 
X Xp v (Ron) is superior to Y X;,7v,(R,) as an estimator of V(¥p,). The former is 
likely to be freer from bias and both should have adequate stability. But with only 
a few strata the issue is questionable until further comparisons appear. 


6.18 COMPARISON OF TWO RATIOS 


In analytical surveys it is frequently necessary to estimate the difference R-R' 
between two ratios and to compute the standard error of R—R’, The formulas 
given here are for the estimated variance of R — R’, since these are the ones most 
commonly required. The fpc terms are omitted for reasons presented in section 
2.14. 


Simple random samplingis assumed at first. Three cases can be distinguished. 


The Two Ratios Are Independent 


This occurs when the units are classified into two distinct classes and we wish to 
compare ratios estimated separately in the two classes. For instance, in a study of 
household expenditures, a simple random sample of households might be sub- 
divided into owned and rented houses in order to compare the proportions of 


income spenton upkeep of the house in the two classes. If the estimated ratios are 
denoted by R = ¥/x, R'=y'/', then 


v(R~R)=v(R)+0(R’) 6.87) 


The Two Ratios Have the Same Denominator 


When the unit is a cluster of families, we might wish to compare the proportion 


of adult males who use electric shavers with the proportion who use razors. In any 
unit, y = number of adult males using electric shavers, y' = number of adult males 
using razors, and x = total number of adult males. 


sy 


RERI IY. 


x 


If di = y,—y,’, the estimated variance of R-R' may be computed as 


AAs 1 n F, x 
MRK) eyed a Rak - (6.88) 
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The Two Ratios Have Different Denominators But May Be Correlated 


An example is the comparison of the proportion of men who smoke with the 
proportion of women who smoke, in a survey in which the unit is a cluster of 
houses. Mathematically, this is the most general case. 


v(R—R')=0(R)+0(R')—2 cov (RR’) (6.89) 

The only unfamiliar term is cov (RR'). Writing, in the usual way, 
202) YedRE ss o hp yy is RE 
R-R= 3 R'-R'=— 
x x' 
we have 
A A 1 
cov(RR =e cov (yi —Rxi)(y;' — R'x;') 


A sample estimate may be computed as follows: 
X (yiyi! — Ryi'Xi —R'yx;'+RR'xixi') (6.90) 


A 1 
cov(RR ae Dee Ter 


Example. The 1954 field trial of the Salk polio vaccine was conducted among children 
in the first three grades in all schools in a number of counties. The counties were not 
randomly selected, since those with a history of previous polio attacks were favored, but for 
this illustration, it will be assumed that they are a random sample from some population. 

Children whose parents did not give permission to participate in the trial were called the 
“not inoculated” group and, of course, received no shots. Half of the children who received 
permission were given three shots of an inert liquid and were called the “placebo” group. 
From the data in Table 6.9, compare the frequencies R, R' of paralytic polio in the “not 
inoculated” and “placebo” groups. To reduce the amount of data, the comparison is 
restricted to 34 counties, each having more than 4000 children in the two groups combined. 

In these data any variation in the polio attack rate from county to county would produce a 
positive correlation between R and R’. 

The following quantities are derived from the totals. 


s_ 88 167.4 
Placebo: R=———=0. Ors g, 
lace. 1674 0.525687, Be 34 4.9235 
> a, 99 284.6 
Not inoculated: R'=———= i= = 8.3706 
ulate 2846 0.347857, <£ 8 


For v(R), v(R’) and cov (RR’), all uncorrected sums of squares and products among the 
four variates are required. 


1 A A 
@)= pee y?—2R Y yx +R?Y x?) 


= pasa asp OO4)~1.05131822.2) 4+ (0.276351661.92)) 


= 0.00584 


182 SAMPLING TECHNIQUES 


TABLE 6.9 
NUMBER OF CHILDREN (2, x’) AND'OF PARALYTIC CASES (y, y’) PER COUNTY 
oe x yt y x x Yi le 
41 2.4 0 tt) 13.8 25.6 3 3 
3.5 8.0 1 6 10.5 8.1 2 0 
4.1 6.1 7 2 21.6 25.9 10 7 
2.6 4.6 2 1 3.5 6.7 2 2 
2.4 1.5 2 1 6.8 73, 3 8 
2:2 1.9 0 0 2.3 3.7 0 1 
1.1 4.0 1 1 2.6 Z9) 2 0 
1.6 4.0 1 2 6.0 11.1 3 1 
5.7 7.8 1 4 11.0 14.8 7 11 
3335" 11-0) 3 7 19.4 42,5 11 14 
1.0 3.8 0 1 6.8 13.7 6 2 
2.0 5.2 1 0 1.2 4.0 3 1 
8.3 19.0 4 4 5.4 9.3 11 6 
1.0 3.7 1 5 17 2.6 0 2 
1,1 4.2 0 1 2.1 2.3 0 0 
2.3 6.8 1 2 1.5 2.6 0 0 
E E E 0 2 3.0 4.0 0 2 
Totals 167.4 284.6 88 99 


RRR ES isc ee a A E 
* x, x’ = numbers of “placebo” and “not inoculated” children (in 1000's) 
ty, y’ = numbers of paralytic polio cases in the placebo and not inoculated 

groups 


Similarly, we find »(R’) = 0.00240. 


5 1 ta ALR ak 
cov (RR =a pe yy'—R Yy'x—R'Y yx'+RR'Y xx') 
(497) —(0.52569)(844.6) —(0.34786)(1397.4) 


=—___#(0.52569)(034786)(2690.8) 
(34)(33)(4.9235)(8.3706) 


=0.00127 
Hence 


s.e.(R —R') = /0.00584+0.00240—0.00254 = 0.0754 


Since R- Ra 0.1778, the difference approaches significance at the 5% level (the distribu- 
tion of R —R' may be somewhat skew for this size of sample). A possible explanation is that 


the not-inoculated children may have had more natural protection against polio than the 
placebo children. 


RATIO ESTIMATORS 183 


The same problem may arise in stratified samples in which the domains of study 
cut across strata. If R,, Ri appear to vary from stratum to stratum, the 
comparison will probably be based on an examination of the values of R,- R;,'in 
individual strata. By finding the standard errors of R,—R,,' it is possible to 
determine whether these differences vary from stratum to stratum and, if not, to 
compute an efficient over-all difference. ~ 

If the Rr ee exhibit no real variation from stratum to stratum, it may be 
sufficient to compare the combined estimates R. and R . As before, 


v(R.-R,')= v(R.) +v(R-')—2 cov (R.R.') (6.91) 
where, putting dy; = (Yni ~ Fn) — Re (ni — Fu)» 
aR Sy (6.92) 
Eure (My, —1) 7 
Ne 
cov (R-Re') == sai (6.93) 


Eose lm = Dir 


A more thorough discussion of the comparison of ratios, including shortcut 
computing formulas when the sample permits them, has been given by Kish and 
Hess (19596). 


6.19 RATIO OF TWO RATIOS — 

In some applications we want to estimate the ratio R/R’ of two ratios. Thus, in 
the preceding example (section 6.18), we might be interested in the ratio R/R” of 
the paralytic polio case rates for “placebo” and “not inoculated” children, or the 
ratio of the proportions of males and females in the labor force from a cluster 
sample. If data on (y,x) are available in the same sample for two time periods, the 
quantity might be the ratio of the weekly expenditures on food per household at 
the two times. 

With a simple random sample (e.g., of clusters) the sample estimate of R/R' is 
R/R' =(y/2)/y' /3! ), sometimes called a double ratio estimate. As with the single 


ratio the leading term in the bias of R/R' is of order 1/n, butis more complex than 
for R or R'. We may write 


R/R' = (R/R'\(1 + 89)(1 + 6%)/(1+ 82) + 69") (6.94) 


where ôy denotes (y — Y)/Y, and so forth. When this expression is expanded, 
there are six quadratic terms of order 1/n that enter into the bias of R/R’. Rao 
and Pereira (1968) give an exact expression for the bias. 

Formula (6.8) for the relative variance V(R)/R? = Cag of a ratio can be 
written 


CRR = Cry t+ Coe +2 Ce 
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From this, the leading term in V(R/R') is 


f 
VRIRY=(4) (Cae + Cae -2Cae) (6.95) 
where 


Car = y+ Coe — Cy — Cog (6.96) 


For the corresponding sample estimate v(R/ R’) we substitute sample estimates of 
the terms in (6.95). 


Example. For the ratio of placebo and not inoculated case rates, R/R= 
0.52569/0.34786 = 1.511. Estimate the s.e. of this ratio. The computations in the preced- 
ing example give 


= _ 0.00584 A _ 0.00240 
=———. = 0, s Co .0198; 
#8=(oisa57)2 | 0-021 #434797 9 
Cip = 0012 oeo 


~ (0.5257)(0.3479) 
»(R/R') = (1.511)*(0.0211 +0,0198) — 0.0139) = 0.0617 
s.e. (R/R')=0.248 


The double ratio estimate has Occasionally been used in place of ve =RX to estimate a 
population total Y, as suggested by Keyfitz (Yates, 1960). Suppose that R'=(5'/z') is 
known for the same sample from a previous period and that R' = Y'/X' is also known. If 
R'/R' has been found, say, to be slightly >1, we might argue intuitively that R is also likely 


tg give an overestimate of R that should be adjusted downward by dividing it by the ratio 
R'/R'. This leads to the double ratio estimate Yor: 


Baris: R 
Yor Pp RA K (6.97) 
Since the relative variance of Yp is Cag, while that of Vor is 
Cant Con —2Cag: 


the double ratio will give a more precise estimate in large samples if the correlation between 
and R’ is high enough. 


6.20 MULTIVARIATE RATIO ESTIMATES 


Olkin (1958) has extended the ratio estimate to the situation in which p 
auxiliary x-variables (x1, %2,...x,) are available, For the population total, the 
Proposed estimate, say Ymr for multivariate ratio, is 


Yur = MEX +W} X,+.. +W, 2x, 
Xi X2 Xp 


= W,Yp,+ W2Yp,+: ‘+W,Yr, 
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where the W; are weights to be determined to maximize the precision of Ýr, 
subject to  W; =1. This type of estimate appears appropriate when the Tegres- 
sion of y on x; X2, . - . Xp is linear and passes through the origin. The population 
totals X; must be known. 

The method is described for two x-variates, since this should be the most 
frequent application. We have 


Pur- Y= Wi(Ye,— Y)+ Wal Yr, — Y) 
Hence, assuming negligible bias, 
V( Yur) = Wi2V(Ye,)+2W; W2 cov (Ye, Ye.) + W22V( Ls) 
= Wi7Vi1+2W1 W2 Vi2+ W2?Vo2 (6.99) | 


where V = V( Yr,)s etc. The values of W,, W2, that minimize the variance, , 
subject to W, + W.=1, are found to be 
Vz- Vi2 Vas Viz 


w= prawe 
| Va+tVa-2Vz * Vir + Vo2=2Viz 


and the minimum variance is 

2 
BVa 
Vir+ V22—-2Vi2 


With p variates, it is necessary to compute the inverse V“ of the matrix V,. Then 
the optimum W, =F ,/Y, where 2, is the sum of the elements in the ith CnN of 
V“ and Dis the sum of all the p? elements of V“. The minimum variance is 1 

In practice, the weights are determined from estimated variances and 
covariances v,. From (6.7) in section 6.3, 


Vista (Yaar) = (6.100) 


1-f\Y? 

vii -CPT opten 2e) 
1-f)Y? 

v22 EDY oy ten 2c) 


Ee er ; 
where c,, = s," /y*, etc. The covariance can be expressed as 


a-p? 


Yi2 = ey +012 —Cy1 —Cy2) 


A convenient method of computation is first to obtain the matrix 


Cyy Cyr Cy2 
C=| c1 Cu C12 
Cy2 C12 C22 


If 0,’ = nv,/(1—f) Ý?, the matrix vy' is easily obtained by taking diagonal contrasts 
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in C, that is, 
Vir = Cyy C11 = CyT Cy1 


V12 = Cyy $012 —Cy1 —Cy2 etc. 


The factor (1—f) Y?/n is not needed when computing the w; but it must be 
inserted when computing the minimum variance. Thus 
(1-—fy¥? (vir v22 — 012") 


aT TS (6.101) 
n (vir + 022' — 2012) 


Vmin (Ymr) = 

In view of the amount of computation involved, this estimate will probably be 

restricted to smaller surveys of specialized scope. The method is capable of giving 
a marked increase in precision over Yr, or Yp, alone. 


6.21 PRODUCT ESTIMATORS 


If an auxiliary variate x has a negative correlation with y where x and y are 
variates that take only positive values, a natural analogue of the ratio estimator is 
the product estimator, for which 


f = Zi f =NI (6.102) 


By the usual Taylor series expansion, the analogue of (6.8) for the product 
estimator in a large simple random sample is 


(cv)? Dic, a CED CN) (6.103) 


where (cv)? is the square of the coefficient of variation of either f, of Ŷ,. P.S.R.S. 
Rao and Mudholkar (1967) have extended Olkin’s multivariate ratio estimator to 
a weighted combination of ratio estimators (for x; positively correlated with y) 
and product estimators (for x; negatively correlated with y). 


EXERCISES 


6.1 A pilot survey of 21 households gave the following data for numbers of members 
(x); children (y,), cars (y2), and TV sets (y3). 


= H Ye Ys z Yi Yo Ys 


BY Ua Gus 
SX I 1 3 2 0 90 1 Gilad DNO, 
25050 1 1 3 1 1 1 es} 1 1 
4 20 Bre S E 4 2 1 1 
4 2 1 1 6 4 2 1 3 1 0 1 
6 4 1 1 3 a s0910 PAU AA) 1 
3 Pe et 2 4 2 1 1 4 2 1 1 
Sie RSP pal 1 SPC 1 1 3 1 1 1 
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Assuming that the total population X is known, would you recommend that.ratio 
estimates be used instead of simple expansions for estimating total numbers of children, 
cars, and TV sets? 

6.2 Ina field of barley the grain, y,, and the grain plus straw, x,, were weighted for each 
of a large number of sampling units located at random over the field, The total produce 
(grain plus straw) of the whole field was also weighed. The following data were obtained: 
yy = 1.13, cy, = 0.78, Cz = 1.11. Compute the gain in precision obtained by estimating the 
grain yield of the field from the ratio of grain to total produce instead of from the mean yield 
of grain per unit. 

It requires 20 min to cut, thresh, and weigh the grain on each unit, 2 min to weigh the 
straw on each unit, and 2 hr to collect and weigh the total produce of the field. How many 
units must be taken per field in order that the ratio estimate may be more economical than 
the mean per unit? 

6.3 For the data in Table 6.1, g =28,367 and cs; =0.0142068, cy, =0.0146541, 
Czz = 0.0156830. Compute the 95% quadratic confidence limits for Y and compare them 
with the limits found by the normal approximation. _ 

6.4 The values of y and x are measured for each unit in a simple random sample froma 
population. If X, the population mean of x, is known, which of the following procedures do 
you recommend for estimating ¥/X7? (a) Always use y/X. (b) Sometimes use 7/X and 
sometimes y/x. (c) Always use y/x. Give reasons for your answer. 

6.5 The following data are for a small artificial population with N = 8 and two strata of 
equal size. 


Stratum 1 Stratum 2 
Tii ni Tzi Yai 
2 0 10 7 
5 3 18 15 
9 7 21 10 
15 10 SSN 16 


For a stratified random sample in which n, =n, = 2, compare the MSE’s of ve and en by - 
working out the results for all possible samples, To what extent is the difference in MSE’s 
due to biases in the estimates? i 

6.6. In exercise 6.5 compute the variance given by using Lahiri’s method of sample 
selection within each stratum and a separate ratio estimate. 

i 6.7 Forty-five states of the United States (excluding the five largest) were arranged in 
nine strata with five states each, states in the same stratum having roughly the same ratio of 
1950 to 1940 population. A stratified random sample with n, = 2 gave the following results 
for 1960 population (y) and 1950 population (x), in millions. 


\ 
Stratum 
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Given that the 1950 population total X is 97.94, estimate the 1960 population by the 
combined ratio estimate. Find the standard error of your estimate by Keyfitz’ short-cut 
method (section 6.13). The correct 1960 total was 114.99. Does your estimate agree with 
this figure within sampling errors? 

6.8 Inthe example ofa bivariate ratio estimate given by Olkin, asample of 50 cities was 
drawn from a population of 200 large cities. The variates y, x,, x. are the numbers of 
inhabitants per city in 1950, 1940, and 1930, respectively. For the population, Y = 1699, 
X, = 1482, X2 = 1420 (in 100’s) and, for the sample, 7 = 1896, z, = bote 1643. The C 
matrix as defined in section 6.20 is 


y Xı X2 
y 1.213 1.241 1.256 
xy 1.241 1.302 1.335 
X2 1.256 1.335 1.381 


Estimate Y by (a) the sample mean, (b) the ratio of 1950 to 1940 numbers of inhabitants, 


and (c) the bivariate ratio estimate. Compute the estimated standard error of each 
estimate. 


6.9 Prove that with Midzuno’s method of sample selection (section 6.15) the probabil- 
ity that any specific sample will be drawn is 


(n-DN=n)! È (œ) 
(N-1)! x 


6.10 Insmall populations the leading termin the bias of R in simple random samples of 
size n is of the form 


B(R-R)= ODO 


n N 


where b, does not depend on n, N. If n=mg and the sample is divided at random 
into g groups of size m, let R, = y/Y x taken over the remaining (n — m) sample members 
when group j is omitted from the sample. Show that in the bias of the estimate 


wR —(w— 1)R, 
both terms in b, vanish if w = g[1—(n —m)/N]. 


CHAPTER 7 


Regression Estimators 


7.1 THE LINEAR REGRESSION ESTIMATE 


Like the ratio estimate, the linear regression estimate is designed to increase - 
precision by the use of an auxiliary variate x; that is correlated with y,. When the 
relation between y; and x; is examined, it may be found that although the relation 
is approximately linear, the line does not go through the origin. This suggests an 
estimate based on the linear regression of y; on x; rather than on the ratio of the 
two variables. 

We suppose that y; and x; are each obtained for every unit in the sample and 
that the population mean X of the x, is known. The linear regression estimate of 
Y, the population mean of the y; is 

In=J+b(X—2) (7.1) 
where the subscript /r denotes linear regression and b is an estimate of the change 
in y when x is increased by unity. The rationale of this estimate is that if z is below 
average we should expect J also to be below average by an amount b(X—z) 
because of the regression of y; on x;. For an estimate of the population total Y, we 
take Y, = Ny, 

Watson (1937) used a regression of leaf area on leaf weight to estimate the 
average area of the leaves on a plant. The procedure was to weigh all the leaves on 
the plant. For a small sample of leaves, the area and the weight of each leaf were 
determined. The sample mean leaf area was then adjusted by means of the 
regression on leaf weight. The point of the application is, of course, that the weight 
of a leaf can be found quickly but determination of its area is more time 
consuming. 

This example illustrates a general situation in which regression estimates are 
helpful. Suppose that we can make a rapid estimate x, of some characteristic for 
every unit and can also, by some more costly method, determine the correct value 
yı of the characteristic for a simple random sample of the units. A rat expert might 
make a quick eye estimate of the number of rats in each block in a city area and 
then determine, by trapping, the actual number of rats in each of a simple random 
sample of the blocks. In another application described by Yates (1960), an eye 
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estimate of the volume of timber was made on each of a population of qo-acre 
plots, and the actual timber volume was measured for a sample of the plots. The 
regression estimate 


j+b(X—x) 


adjusts the sample mean of the actual measurements by the regression of the 
actual measurements on the rapid estimates. The rapid estimates need not be free 
from bias. If x; — y; = D, so that the rapid estimate is perfect except for a constant 
bias D, then with b = 1 the regression estimate becomes 


ft (X—-x)=X+(F¥=2) 
= (pop. mean of rapid estimate) + (adjustment for bias) 


If no linear regression model is assumed, our knowledge of the properties of the 
regression estimate is of the same scope as our knowledge for the ratio estimate. 
The regression estimate is consistent, in the trivial sense that when the sample 
comprises the whole.population, < = X, and the regression estimate reduces to Y. 
As will be shown, the estimate is in general biased, but the ratio of the bias to the 
standard error becomes small when the sample is large. We possess a large-sample 
formula for the variance of the estimate, but more information is needed about the 
distribution of the estimate in small samples and about the value of n required for 
the practical use of large-sample results. 

By a suitable choice of b, the regression estimate includes as particular cases 
both the mean per unit and the ratio estimate..Obviously if b is taken as zero, Jy 
reduces to y. If b = y/z, 


In =9+2(R-2)=2X= Ye (7.2) 


7.2 REGRESSION ESTIMATES WITH PREASSIGNED b 


Although, in most applications, b is estimated from the results of the sample, it 
is sometimes reasonable to choose the value of b in advance. In repeated surveys, 
previous calculations may have been shown that the sample values of b remain 
fairly constant; or, if x is the value of y at a recent census, general knowledge of 
the population may suggest that b is not far from unity, so that b= 1 is chosen. 
Since the sampling theory of regression estimates when b is preassigned is both 
simple and informative, this case is considered first. 


Theorem 7.1. In simple random sampling, in which bọ is a preassigned 
constant, the linear regression estimate 


Vir = 9 + bo X— ZX) 
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is unbiased, with variance 


1p È loi- P- - DF 
VDSS r NET (7.3) 


1 — 
=f (S,2—2boSyx + bo2S.2) (7.4) 


Note that no assumption is required about the relation between y and x in the 
finite population. 


Proof. Since bo is constant in repeated sampling, 
Elf) = E(y) + boE(%-X) = Ý (7.5) 


by theorem 2.1. Furthermore, ¥;,-is the sample mean of the quantities y,— 
bo(x; — X), whose population mean is Y. Hence, by theorem 2.2, 


N = a 
1 2 [= Y) —bo(x; -XF 

V ==: AE (76) 
=f (5?_o1,5 +b S) (7.7) 

ma hse?” OYyx F Do Oy z 

Corollary. An unbiased sample estimate of V(¥;,) is 
pa x Ly: =7)— bolx D] 

vj) E (1.8) 

s n n-1 i 
=le -bosn boss) (7.9) 


This follows at once by applying theorem 2.4 to the variate y;—bo(x; -X). 


A natural question at this point is: What is the best value of bo? The answer is ~ 
given in theorem 7.2. 


Theorem 7.2. The value of bo that minimizes V(¥;,) is 


s, 2, 0- Pe- 
bo= B= Sg = 1 (7.10) 


N us 
È XP 


which may be called the linear regression coefficient of y on x in the finite 
Population. Note that B does not depend on the properties of any sample that is 
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drawn, and therefore could theoretically be preassigned, The resulting minimum 
variance is 


Vmin (Fur) L520 —p’) (7.11) 


where p is the population correlation coefficient between y and x. 
Proof. In expression (7.4), for V(¥i,), put 


bo= B+d = 2 +d (7.12) 


This gives 


_ 2 
V(Fu) = Lt [s, -25,,(4+d) +52(2 +2d5%+d")] 
Se S: S 
sis Gat 
=i A (s?- S) + ase) (1.13) 
Clearly, this is minimized when d =0. Since p° = Sys /Sy Sx, 


Vmin (Fir) = ifs S7(1-p7) (7.14) 


The same analysis may be used to show how far bo can depart from B without 
incurring a substantial loss of precision. From (7.13) and (7.14), 


1- 
Vig) = "5,319 + (b0-BY'S." (7.15) 
S eae 
i) 1+ S,(1—p") (7.16) 
Since BS, = pS,, this may be written 
b, 2 
VOD = Vain Go) f+ (22-1) Es] 17 
Yı (Fir) ( 1 (1=p?) (7.17) 
Thus, if the proportional increase in variance is to be less than a, we must have 
bo 
|---| <v (7.18) 


For example, if p =0.7, the increase in variance is less than 10%, (a =0,1), 
provided that 


<v(0.1)(0.51)/(0.49) = 
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Expression (7.18) makes it clear that in order to ensure a small proportional 
increase in variance bo/B must be close to 1 if p is very high but can depart 
substantially from 1 if p is only moderate. 


7.3 REGRESSION ESTIMATES WHEN b IS COMPUTED FROM 
THE SAMPLE 


Theorem 7.2 suggests that if b must be computed from the sample an effective 
estimate is likely to be the familiar least squares estimate of B, that is, 


È yi- 3) 
b=- = (7.19) 
x (xi zo 


The theory of linear regression plays a prominent part in statistical methodol- 
ogy. The standard results of this theory are not entirely suitable for sample surveys 
because they require the assumptions that the population regression of y on x is 
linear, that the residual variance of y about the regression line is constant, and that 
the population is infinite. If the first two assumptions are violently wrong, a linear 
regression estimate will probably not be used, However, in surveys in which the 
regression of y on x is thought to be approximately linear, it is helpful to be able to 
use f, without having to assume exact linearity or constant residual variance. 

Consequently we present an approach that makes no assumption of any specific 
relation between y; and x;. As in the analogous theory for the ratio estimate, only 
large-sample results are obtained. - 

With b as in (7.19), the linear regression estimator of Y in simple random 
samples is 

Jnr =F +b(X—-Z)=F—b(Z-X) (7.20) 

The estimator Fr, like Jr, will be shown in section 7.7 to have a bias of order 1/n. 
In finding the sampling error of f}, replace the sample b in (7.20) by the 
population regression coefficient B in (7.10), In Theorem 7.3 the error committed 
in this approximation will be shown to be of order 1/Vn relative to the terms 


retained. We first examine the relation between b and B. 
Introduce the variate e; defined by the relation 


e =yi— Y-B(x:—X) (7.21) 


` N 
Two properties of the e; are that ) e; = 0 and 


San- D lm- a X BY XP =0 (7.22) 
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by definition of B. Now 


b =È yilx,—z)/8 (x, — 2° 
=S(%+ B(x, =X) telex —2)/¥ i 


=B+y e(x;=%)/5 (x; =%)? ; (7.23) 


Aresult needed in Theorem 7.3 is that (6 — B) is of order 1/Vn. By theorem 2.3, 


n N N y 
È e:(x;-)/(n -1) is an unbiased estimate of X e;(x;—X)/(N—1) which, by 
(7.22), is zero. Thus Spee Get) is distributed about a zero mean in 
repeated samples. Since the standard error of a sample covariance is known to be 
of order 1//n, Y e;(x;—£)/(n —1) is of order 1/V7. Bat Li-¥)?/(n-1) =s,7 is 
of order unity. Hence (b—B), which from (7.23) is the ratio of these two 
quantities, is of order 1/Vn. 


Theorem 7.3. If b is the least squares estimate of B and 
Yr =F +b(X—Z) (7.24) 
then in simple random samples of size n, with n large, 
1- 
vgn Esa- (725) 


where p = S,;/S,S, is the population correlation between y and x. 
Proof. The sampling error of Yn arises from the quantity 


Jn-Y=f-¥+b(X-2) (7.26) 

As an approximation, replace Jı by 
Vir = + B(X—z) (7.27) 
where B is the population linear regression coefficient of y on x. The error 
committed in this approximation is (B — b)(X —Z). This quantity is of order 1/n in 
asimple random sample of size n, since (b — B) and (—X) are both of order 1 /Vn. 
But the sampling error in f, is of order 1 /Vn, since it is the error in the sample 


mean of the variate (y; — Bx;). Hence the leading term in E(ji,- Y)? is Vir). By 
(7.11), in large samples, 


E,- P+ Vin) = FD a4 2 (7.28) ` 
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7.4 SAMPLE ESTIMATE OF VARIANCE 


As a sample estimate,of V(yj,), valid in large samples, we may use 


0G) = aE È o-a (7.28) 
= lhe < E (i=¥) (i —¥)P 
aa Cr ae (7.30) 


the latter being the usual short-cut computing formula. The derivation is as 
follows. 
In theorem 7.3, equation (7.28), we had, since $,2(1—p?) = S}, 
1-— 
vgn EPs; 


From theorem 2.4, an unbiased estimate of S,? is 


s2=— $ (e-e? 


3 n=1 i=1 
Now, from equation (7.21), it follows that 
e -E= (yi —7)- B(x: —¥) =[(y, -7)-b(x:—7)]+ (b — B)(x, 2) (7.31) 


The second term on the right, of order 1/Vn, may be neglected in relation to the 
first term, which is of order unity. Hence in large samples we may use 


iE l-ba- (7.32) 


as an estimate of S,*. The divisor (n—2) instead of (n—1) is suggested in 
(7.29) and (7.30) because it is used in standard regression theory and is known to 


give an unbiased estimate of S? if the population is infinite and the regression 
is linear. 


7.5 LARGE-SAMPLE COMPARISON WITH THE RATIO 
ESTIMATE AND THE MEAN PER UNIT 


For these comparisons the sample size n must be large enough so that the 
approximate formulas for the variances of the ratio and regression estimates are 
valid. The three comparable variances for the estimated population mean Ÿ are as 
follows. 
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Vli) = Yrsa —p*) (regression) 
INS Die to: 
Vr) = = (S2 +R?S? —2RpS,S,) (ratio) 
Viy)= Nas. (mean per unit) 


It is apparent that the variance of the regression estimate is smaller than that of 
the mean per unit unless p = 0, in which case the two variances are equal. 
' The variance of the regression estimate is less than that of the ratio estimate if 


—p*S,;7<R?S,=2RpS,S, (7:33) 
This is equivalent to the inequalities 
(pS,-RS,7>0 or (B-R)>0 (7.34) 


Thus the regression estimate is more precise than the ratio estimate unless 
B= R. This occurs when the relation between y; and x, is a straight line through 
the origin. 


Example. The precision of the regression, ratio, and mean per unit estimates from a 
simple random sample can be compared by using data collected in the complete enumera- 
tion of peach orchards described on p. 172. In this example, y, is the estimated peach 
production in an orchard and x, the number of peach trees in the orchard. We will compare 
the estimates of the total production of the 256 orchards, made from a sample of 100 
orchards, It is doubtful whether the sample is large enough to make the variance formulas 
fully valid, since the cv’s of f and ¥ are both somewhat higher than 10%, but the example 
will serve to illustrate the computations. The basic data are as follows. ; 


$7=6409 S,=4434 $7 =3898 
R=1270 =0887 1=100 N=256 


VY) = MARY sac =p?) 
256)(156, 
= OO (6409)(1 — 0.787) = 545,000 
V(¥e) = NOs +R?S2-2RS,.) 
__ (256)(156) 
=o 6409+ (1.613)(3898) —2(1.270)(4434)] 
= 573,000 
a N(N- 
(=A A=" 52—9,559,000 
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There is little to choose between the regression and the ratio estimates, as might be 
expected from the nature of the variables. Both techniques are greatly superior tothe mean 
per unit. 


The preceding result on the superiority of the linear regression estimate is 
strictly a large-sample result. In small samples on natural populations the regres- 
sion estimate appears disappointing in performance. In eight natural populations 
of the type in which the ratio estimate has been used. Rao (1969) found ina Monte 
Carlo study that the average of the ratios MSE( Y;,)/MSE(Y) was 1.15 for 
n= 12, 1.36 for n =8, and 1.51 for n = 6. These lower efficiencies of Y;, were not 
due to greater biases in the regression estimates, the corresponding variance ratios 
being almost the same. 


7.6 ACCURACY OF THE LARGE-SAMPLE FORMULAS FOR 
VO.) AND vr) 


No general analytical results are available on the accuracy of the approximate 
formulas (7.25) for V(¥i-) and (7.29) for v (Fr) in moderate or smali samples. The 
approximate estimators in (7.25) and (7.29) are 


Vii) = Psa -p°) ; (1.25) 


pŠ LER z 
o= lo) b (1.29) 


Suppose that the y; for i=1,2,...N are a random sample from an infinite 
population under the model 


yı=a+BxitEi (7.35) 


where for fixed x, the s; are independently distributed with mean 0, variance 
a2 =0,7(1—p’). With this model, Cochran (1942) gave the result that to terms of 
order 1/n?. 


a- 1 2G;7 
= ) (1.36) 


EV (Fr) ol Coad RE =p?)(14+— 4 
where G, = k3,/o-2 is Fisher’s measure of relative skewness of the distribution of 
x. Since S; (1-p°) in (7.25) is an unbiased estimate of o, (1—p°) under this 
model, (7.36) suggests that with x symmetrically distributed the percent under- 
estimation by V(Jir) is 100/(n —2) with this model. 

From Monte Carlo studies on eight small natural populations, (Rao, 1968), 
Table 7.1 shows for n =6, 8, 12, the average percent underestimation of the 


variance of f, by the approximations V(j,,) and v(Fir)- 
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TABLE 7.1 
AVERAGE PERCENT UNDERESTIMATION OF THE VARIANCE OF Vn 


n 
Estimator 6 8.12 


V(¥i-) in (7.25)| 38 34 28 
v(F,) in (7.29) | 48 42 33 


For x; symmetrical, formula (7.36) Suggests percent underestimations by V (Fy) 
of 25, 17, and 10% for n=6, 8, 12. The percents for V(¥j,) in Table 771 are 
substantially higher, by amounts judged unlikely to be accounted for by skewness 
in x in these populations—more likely by deficiencies in the linear model. The 
underestimations are still greater for the sample estimates of variance v(j,). 
Furthermore, comparison with Table 6.2, page 164, which applies to the ratio 
estimate in the same eight populations, shows that the percent underestimations 
in V and v for Jņ are at least twice those for Yr in samples of the same size. 


7.1 BIAS OF THE LINEAR REGRESSION ESTIMATE 


The estimator f, has a bias of order 1/n in simple random sampling, We have 


E(ji,) = Y-Eb(@-X) (7.37) 


Thus one expression for the bias is —Eb(%-X) =-cov (b, £). The leading term in 
the bias turns out to be 
76 — a 

E f) a X) (7.38) 
This term represents a contribution from the 
sion of y on £, Thus, if a sample plot of y; 
there should be little risk of major bias i 
_ To show (7.38) requires some algebra’ 


quadratic component of the regres- 
against x, appears approximately linear, 
N Fin 

ic development, By (7.23), page 194, 


i( Ee 
pase: sua! (7.39) 
Li-%) 


Replace X (x, = £)? by its leading term, nS,?, Also write 


EAS T E (7,40) 
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Hence the leading term in the bias — Eb(Z — X) of Jj, is the average of . 
n = — = 
=Y e(x,—X(E—X) EE- 
pT Hei aa bine caw 
nS, Se 


Let u; = e;(x;— X). By (7.22) its population mean U=0. The average value of the 
first term in (7.41) may therefore be written 


-E(@-O)E-X)_ _ =f) EW —O)(%-X) 
= U)G=X) ae 


(7.41) 


Gi n 
by theorem (2.3) (p. 25) for the average value of a sample covariance in simple 
random sampling. This in turn equals (7.38), namely 

L =f) Eei(xı ~ Xx)? 
n $2 


In the second term in (7.41), 2 is O(1/Vn) and (¥- Xý is O(1/n), so that this term 
is of smaller order than (7.38). Thus (7 .38) is the leading term in the bias of Jy. 


(7.42) 


7.8 THE LINEAR REGRESSION ESTIMATOR 
UNDER A LINEAR REGRESSION MODEL 


Suppose that the finite population values y; (i=1,2,...N) are randomly 
drawn from an infinite superpopulation in which 
y=at+px+e (7.43) 
where the e are independent, with means 0 and variance a,” for fixed x. By direct 
substitution from the model we find that 


paid g ,Rale® i (7.44) 
E-i) E-a) 

Jr- Y=, e pe (7.45) 
¥ (x2)? 


where ë, and &y are means over the sample and the finite population. It follows 
from (7.45) that under this model, E (Jr — Ý) =0, so that Jy is model-unbiased for 
any size of sample. 

As regards variance, it follows from (7.45) that for a given set of x's, 


=)? s 
Vn) BG Pi=or| (i-i EE] ao 


E=) 
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This result holds for any n > 1 and any sample selected solely by the values of x. 
This approach and its generalization to the case of unequal residual variances 
were given by Royall (1970). Under this model a purposive sampling plan that 
succeeded in making ¥ = X would minimize V (Fi) for given n. 


Also, for any sample selected solely according to the values of the x;, the usual 
least squares estimator 


se =S- bo, =¥)P/(n 2) (7.47) 


is a model-unbiased estimator of o,? for n >2. 

Thus, in problems in which this model applies, simple exact results about the 
mean and variance of yj, can be established, valid for any sample size >2 and 
requiring only sample selection according to the values of the x, the random 
element being supplied gratis by the distribution of the e’s assumed in the model. 


7.9 REGRESSION ESTIMATES IN STRATIFIED SAMPLING 


As with the ratio estimate, two types of regression estimate can be made in 
stratified random sampling. In the first estimate Jm, (s for separate), a separate 
Tegression estimate is computed for each stratum mean, that is, 

Firn = Fn + Dy (Xa — Fp) (7.48) 
Then, with W, = N,/N, 


Firs = Widen (7.49) 


This estimate is appropriate when it is thought that the true regression coefficients 
B, vary from stratum to stratum. 


The second regression estimate, Fire (c for combined), is appropriate when the 
B, are presumed to be the same in all strata. To compute Fre, we first find 


Ys =z Widn Fn =) With 
h 
Then 
Fre = For +b(X —¥,.) (7.50) 


The two estimates will be considered first in the case in which the bn and b are 
chosen in advance, since their properties are unusually simple in this situation. 
From section 7 2, p, is an unbiased estimate of Y;,, so that fy, is an unbiased 


estimate of Y. Since sampling is independent in different strata, it follows from 
theorem 7.1 thit 


2 = 
W(Firs) = 5220S Eo Sah. = C5 
'h 
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Theorem 7.2 shows that V(Fns) is minimized when b, = B,, the true regression 
coefficient in stratum h. The minimum value of the variance may be written 


Wa (l — fa) (s = = 
y! 2 
Sen 


Turning to the combined estimate with preassigned b, (7.50) shows that Jire is 
also an unbiased estimate of Y in this case. Since Fr- is the usual estimate from a 
stratified sample for the variate ya; +b(X —x;;), we may apply theorem 5.3 to this 
variate, giving the result 


Vie) = eh) 


h nh 


Varinis) = z (7.52) 


nN 


(Si? 055,57 3842) (7.53) 


The value of b that minimizes this variance is 
Wi'(1 afi) Svan Jz Will fh) Sen (7:54) 
np, h 


nha 


B.=} 


h 


The quantity B. is a weighted mean of the stratum regression coefficients 
By, = Syxn/ Sen’ If we write 
Wi(l-fr) o 2 
ay, = Sei 
Np, 


then B. == a),B,/> ap- 
From (7.52) and (7.53), with B. in place of b, we find 


Vmin (Fire) — Vmin (Firs) =D AnBu -È an) VE 
=} an (Br — Be)” (7.55) 


This result shows that with the optimum choices the separate estimate has a 
smaller variance than the combined estimate unless B, is the same in all strata. 
These optimum choices would, of course, require advance knowledge of the Syza 
and Sx» values. 


7.10 REGRESSION COEFFICENTS ESTIMATED 
FROM THE SAMPLE 


The preceding analysis is helpful in indicating the type of sample estimates b, 
and b that may be efficient when used in regression estimates. With the separate 
estimate, the analysis suggesis that we take 


E (ni = Yu) ni — Xn) 
b, = — 
Z (ni —¥n) 


the within-stratum least squares estimate of Bh. 


(7.56) 
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Applying theorem 7.3 to each stratum, we have 


7) = pote Wie a Wi afr) ¢ yap) (7.57) 


provided that the sample size n, is et in all strata. To obtain a sample estimate 
of variance, substitute 


1 
Saa On =I b E u=] (7.58) 


in place of $,,7(1—9;) in (7.57). 

The estimate y, suffers from the same difficulty as the corresponding ratio 
estimate, in that the ratio of the bias to the standard error may become appreci- 
able. It follows from section 7.7 that the regression estimates Fy, in the individual 
strata may have biases of order 1/7, and the biases may be of the same sign in all 
strata, so that the over-all bias in Jys may also be of order 1/n,,. Since the leading 
term in the bias comes from the quadratic regression of ya on xpp as shown in 
section 7.7, this danger is most acute when the relation. between the variates 
approximates the quadratic rather than the linear type, 

With the combined estimate, we saw that the variance is minimized when 
b= B, as defined in (7.54). This suggests that we take 


=p Wie(1- ee 


n(n, = 


c 


W, (1- 

j > ni Ibo ~&)/5 i eae E Gow =i)? 
na(n = 

as a sample estimate of B.. If the stratification is proportional and if we may 


replace the (n,~1) in be by m, be reduces to the familiar pooled least squares 
estimate ; 


bj! = x 2 (Yai = Fn) (Xn -a)/Z3 D Gn =n)? 


In certain circumstances other estimates may be preferable to b, or b,'; For 
instance, if the true regression coefficients B, are the same in all strata but the 
residual variances about the regression line differ substantially from one stratum 
to another, a different weighted mean of the b, weighting inversely as the 
estimated variance, may be more precise. However, the gain in precision as it 
affects Jne is likely to be small, 

Since 


Ire = Y= Ju e ¥+b.(X =u) 
= [fu ¥+ B(X=Fu)) + (be = BX =%,) (7.59) 
it follows that if sampling errors of be are negligible 


2/4 = 
Vire) = >X Ms 5 f (Sy? = 2B Syn + B.S.) (7.60) 
h h 


g 
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As an estimate of-V (Fme), we may take 


X= 
To a E 


RS AED E Yni = Ya) — be re — Fn) P (7.61) 


7.11 COMPARISON OF THE TWO TYPES OF 
REGRESSION ESTIMATE 


Hard and fast rules cannot be given to decide whether the separate or the. 
combined estimate is better in any specific situation. The defects of the separate 
estimate are that it is more liable to bias when samples are small within the 
individual strata and that its variance has a larger contribution from sampling 
errors in the regression coefficients. The defect of the combined estimate is that its 
variance is inflated if the population regression coefficients differ from stratum to 
stratum. 

If we are confident that the regressions are linear and if B, appears to be 
roughly the same in all strata, the combined estimate is to be preferred, If the 
regressions appear linear (so that the danger of bias seems small) but B, seems to 
vary markedly from stratum to stratum, the separate estimate is advisable. If there 
is some curvilinearity in the regressions when a linear regression estimate is used, 
the combined estimate is probably safer unless the samples are large in all strata. 

Estimators of the regression type that are unbiased have been developed by 
Mickey (1959) and Williams (1963), but have not yet been extensively tried. Rao 
(1969) found Mickey’s estimator usually inferior to the standard regression and 
ratio estimators in natural’populations. A jackknifed version can also be con- 
structed. With n = mg, let Fuy be the standard regression estimate, computed 
from the sample with group j omitted, (j = 1, 2,..., g). Then the jackknifed form 
is 


B 
Juja = 8e- hg- (è Fini) j8 (7.62) 
EXERCISES 
7,1- An experienced farmer makes an eye estimate of the weight of peaches x, on each 


tree in an orchard of N = 200 trees, He finds a total weight of X = 11,600 /b, The peaches 
are picked and weighed on a simple random sample of 10 trees, with the following results, 


Tree Number 


Wek hahaaa Ogee 8 po) 10, a Total 


Actual wt. Y 61 42 50 $8 67 45 39 57 71 53 343 
Est. wt æ 59 47 52 60 67 48 44 58 76 58 569 
RR a e SE 


204 SAMPLING TECHNIQUES 
As an estimate of the total actual weight Y, we take 
¥=NIX+(5-)] 
Compute the estimate and find its standard error. 

7.2 Does it appear that the linear regression estimate, with the sample least squares b, 
would give a more precise estimate in 7.1? 

7.3 From the sample data in Table 6.1 (p. 152) compute the regression estimate of the 
1930 total number of inhabitants in the 196 large cities. Find the approximate standard 
error of this estimate and compare its precision with that of the ratio estimate. 

7.4 Inexercise 7.3 find the estimated total number of inhabitants and its standard error 
if b is taken as 1. J 

7.5 In the following population with N=S, verify (a) that the regression of y on x is 
linear and (b) that the linear regression estimate is unbiased in simple random samples with 
n =3; The (y, x) pairs are (3, 0), (5, 0), (8, 2), (8, 3), (12, 3). 


7.6 A rough measurement x, made on each unit, is related to the true measurement y 
on the unit by the equation 


x=yt+etd 


where d is a constant bias and eis an error of measurement, uncorrelated with y, which has 
mean zero and variance S, in the population, assumed infinite. In simple random samples 
of size n compare the variances of (a) the “difference” estimate [¥ +(X—)] of the mean Ý 
and (b) the linear regression estimate, using the value of b that gives'minimum variance. 
(The variances may depend on S,”.) 

7.7 By working out all possible cases, compare the MSE’s of the separate and 
combined regression estimates of the total Y of the following population, when simple 
random samples of size 2 are drawn from each stratum. For each estimate, how much does 
its bias contribute to the MSE? 


Stratum 1 Stratum 2 
Sii Yii Tai Yor 
4 0 5 14 
6 3 6 12 
7 5 8 13 


Use the ordinary least squares estimates of the B’s, b, and be on pp. 201-2. 


7.8. In the population of exercise 7.7, show that if the optimum preassigned B could be 
used in each case, V(Y;,,)=4.39, V(Yi,.) = 4.43, both estimates being, of course, unbiased. 

7.9 By the same method, compare the MSE’s of the separate and combined ratio 
estimates in the population in exercise 7.7. Since the ratio Y/X is 8/17 = 0.47 in stratum 1 
and 32/19 =1.68 in stratum 2, large sample theory would suggest that Yg, would be 
superior to Yre You will find, however, that in these tiny samples, Y,. has the smaller 
MSE. Its superiority is not due to smaller bias, neither Yg, nor Yge being materially biased. 
As another disagreement with large-sample theory, you will find that Ý}, and Yz, have 
smaller MSE’s than the corresponding regression estimates. 


CHARR ERIS 


Systematic Sampling 


8.1 DESCRIPTION 


This method of sampling. is at first sight quite different from simple random 
sampling. Suppose that the N units in the population are numbered 1 to N in some 
order. To select a sample of n units, we take a unit at random from the first k units 
and every kth unit thereafter. For instance, if k is 15 and if the first unit drawn is 
number 13, the subsequent units are numbers 28, 43, 58, and so on. The selection 
of the first unit determines the whole sample. This type is called an every kth 
systematic sample. 

The apparent advantages of this method over simple random sampling are as 
follows. 


1. It is easier to draw a sample and often easier to execute without mistakes. 
This is a particular advantage when the drawing is done in the field. Even when 
drawing is done in an office there may be a substantial saving in time. For instance, 
if the units are described on cards that are all of the same size and lie in a file 
drawer, a card can be drawn out every inch along the file as measured by a ruler, 
This operation is speedy, whereas simple random sampling would be slow. Of 
course, this method departs slightly from the strict “every kth” rule. 

2. Intuitively, systematic sampling seems likely to be more precise than simple 
random sampling. In effect, it stratifies the population into n strata, which consist 
of the first k units, the second k units, and so on. We might therefore expect the 
systematic sample to be about as precise as the corresponding stratified random 
sample with one unit per stratum. The differe.ice is that with the systematicsample 
the units occur at the same relative position in the stratum, whereas with the 
stratified random sample the position in the stratrm is Jeterrained separately by 
randomization within each stratum (see Fig. 8.1). The systematic sample is spread 
more evenly over the population, and this fact has sonetimes made systematic 
sampling considerably more precise than stratified random sampling. 

One variant of the systematic sample is to choose each unit at or near the center 
of the stratum; that is, instead of starting the seque ice by a random number 
chosen between 1 and k, we take the starting number as {x + 1)/2 if k is odd and 
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‘SYSTEMATIC SAMPLING 


x = systematic sample o = stratified random sample 
Lx Lo | _x. Lu——olx—o—bx—] 
k 2k 3k 4k 5k 6k 
Unit number 


Fig. 8.1 Systematic and stratified random sampling. 


either k/2 or (k +2)/2 if k is even (Madow, 1953). This procedure carries the idea 
of systematic sampling to its logical conclusion. If y; can be considered a 
continuous function of a continuous variable i, there are grounds for expecting 
that this centrally located sample will be more precise than one randomly located. 
Limited investigation on some natural populations supports this opinion, 
although centrally located samples tend to behave erratically. Attention here will 
be confined to samples with some random element. 


TABLE 8.1 


THE POSSIBLE SYSTEMATIC SAMPLES FoR N = 23,k =5 
Systematic sample number 


I II TT iV’ V: 


1 2 3 4 5 
6 7 8 9 10 
11 12 13 14 15 
16 17 18 IINR 20 
21 Z203; 


Since N is not in general an integral multiple of k, different systematic samples 
from the same finite population may vary by one unit in size. Thus, with N = 23, 
k = 5, the numbers of the units in the five systematic samples are shown in Table 
8.1. The first three samples have n =5 and the last two have n =4. This fact 
introduces a disturbance into the theory of systematic sampling. The disturbance 
is probably negligible if n exceeds 50 and will be ignored, for simplicity, in the 
presentation of theory. It is unlikely to be large even when n is small. 

Another method, suggested by Lahiri in 1952 (see Murthy, 1967, p. 139) 
provides both a constant sample size and an unbiased sample mean. Regard the N 
units as arranged round a circle and let k now be the integer nearest to N/n. Select 
a random number between 1 and N and take every kth unit thereafter, going 
round the circle until the desired n units have been chosen. Suppose we want n = 5 
with N = 23. Then k = 5. If the random number is 19 we take units 19, 1, 6, 11, 16. 
It is easily verified that every unit has an equal probability of selection with this 

method. If n =4 units are wanted with N= 23 we take k =6. 
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8.2 RELATION TO CLUSTER SAMPLING 


There is another way of looking at systematic sampling. With N= nk, the k 
possible systematic samples are shown in the columns of Table 8.2. It is evident 
from this table that the population has been divided into k large sampling units, 
each of which contains n of the original units. The operation of choosing a 
randomly located systematic sample is just the operation of choosing one of these 
large sampling units at random. Thus systematic sampling amounts to the selec- 
tion of a single complex sampling unit that constitutes the whole sample. A 
systematic sample is a simple random sample of one cluster unit froma population 
of k cluster units. 


TABLE 8.2 
COMPOSITION OF THE k SYSTEMATIC SAMPLES 
Sample number 


1 2 fi k 
Yı Ya Yi Yy 
Yk+i Yki Yksi Yar: 
Y(n-1yke1 Yin=1)k+2 Yonah i Unk 
Means 9 Vo a Ür 


8.3 VARIANCE OF THE ESTIMATED MEAN 


Several formulas have been developed for the variance of ,,, the mean of a 
systematic sample. The three given below apply to any kind of cluster sampling in 
which the clusters contain n elements and the sample consists of one cluster. In 
these formulas we assume N = nk. 

If N= nk, it is easy to verify that Jsy is an unbiased estimate of Y for a randomly 
located systematic sample, 

In the following analysis the ue yy denotes the jth member of the ith 


systematic sample, so that j=1,2,...,n, i=1,2,...,k. The mean of the ith 
sample is denoted by J. 


Theorem 8.1. The variance of the mean of a systematic sample is 


Vig \s ok gt aw) 


2 
N N RI (8.1) 


where 
2 kon 


Swsy = a 7 2,2 Oar yi)? 


Visas 
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is the variance among units that lie within the same systematic sample. The 
denominator of this variance, k(n—1), is constructed by the usual rules in the 
analysis of variance: each of the k samples contributes (n — 1) degrees of freedom 
to the sum of squares in the numerator. 


Proof. By the usual identity of the analysis of variance 
(N- YS*= EE (y= OF 
o n OHI I 
But the variance of F, is by definition 


Meur S 
VO) =g 2 O7 Yi 
Hence 
(N~1)S?= nkV(j,y)+ k(n —1)S2,, (8.2) 
The result follows. 


Corollary. The mean of a systematic sample is more precise than the mean of a 
simple random sample if and only if 


2 2 
Sy >S 


Proof. If is the mean of a simple random sample of size n, 


N-n $? 
v(ş)= wy 
(y) Wasa 
From (8.1), V(¥,,)< V(¥) if and only if 
N=1.._ k(n-1).. O NENS 
N S N Swsy < NaM (8.3) 
that is, if 
2 N=n)\ 22 2 
k(n=1)Swsy> Nee S?=k(n-1)S (8.4) 


This important result, which applies to cluster sampling in general, states that 
Systematic sampling is more precise than simple random sampling if the variance 
within the systematic samples is /arger than the population variance as a whole. 
Systematic Sampling is precise when units within the same sample are heterogene- 
ous and is imprecise when they are homogeneous. The result is obvious intuitively. 
If there is little variation within a systematic sample relative to that in the 
population, the successive units in the sample are repeating more or less the same 
information. 


Another form for the variance is given in theorem 8.2. 
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Theorem 8.2. 
S?(N=1 
vgn)=2(S)u+0- 1)p,.] (8.5) 


where p, is the correlation coefficient between pairs of units that are in the same 
systematic sample. It is defined as 


_ E(yy— Yu- Y) (8.6) 


pa E(yi; T jok 


where the numerator is averaged over all kn(n—1)/2 distinct pairs, and the 
denominator over all N values of y,. Since the denominator is (N— 1)S?/N, this 
gives 
Toe Diyu- F) (8.7) 
Pw ~ th =1)(N—1)S2 2 Z (Yy Yiu . 
Proof. 


k 2 
-n?°k Vh) =n? Baly Y 
i=1 


= = lon Y)+ (92 P+: +n- POP 


The squared terms amount to the total sum of squares of deviations from Y, that 
is, to (N— 1)S?. This gives 


WkV(Fxy) = (N= 1S? +2] E (y= YOu Y) (8.8) 
i jsu p 
=(N—1)S?+(n—-1)(N-1)S?9,, (8.9) 
Hence í 
?(N—1 
Vn) == (S ita- Dp] (8.10) 


This result shows that positive correlation between units in the same sample 
inflates the variance of the sample mean. Even a small positive correlation may 
have a large effect, because of the multiplier (i: — 1). 

The two preceding theorems express V(¥,y) in terms of S?, hence relate it to the 
variance for a simple random sample. There is an analogue of theorem 8.2 that 
expresses V(j,,) in terms of the variance for a stratified random sample in which 
the strata are composed of the first k units, the second k units, and so on. In our 
notation the subscript j in y; denotes the stratum. The stratum mean is written J j. 


Theorem 8.3. 
Saf N= 
Vis) = Sa(Non riit- 1)Pws:] i (8.11) 
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where 
1 n k 
————— ye (8.12) 
Swat D a T) 


This is the variance among units that lie in the same stratum. The divisor n (k — 1) 


is used because each of the n strata contributes (k —1) degrees of freedom. 
Furthermore, 


E(y¥y =F) Vu = Fu) 
s2 at 8.13) 
pee Eyy ziy 


This quantity is the correlation between the deviations from the stratum means of 
pairs of items that are in the same systematic sample. 


E ee OT) 
aS ee pid Eee A LAS A AEE 
Penn-a SE, S18) 


The proof is similar to that of theorem 8.2. 


Corollary. A systematic sample has the same precision as the corre- 
sponding stratified random sample, with one unit per stratum, if p,,,,=0. This 
follows because for this type of stratified random sample V(j,,) is (theorem 5.3, 


corollary 3) 
N- Sks 
von) =( F ") Ssu (8.15) 


Other formulas for V(¥.,), appropriate to an autocorrelated population, have 


been given by W. G. and L. H. Madow (1944), who made the first theoretical study 
of the precision of systematic sampling. 


Example. The data in Table 8.3 are for a small artificial 
steady rising trend. We have N = 40, k = 10, n= 
sample, and the rows are the strata. 
“within-stratum” correlation is positive. For inst 


consistently true, with a few exceptions, in the first five Systematic samples, In the last five 
samples, deviations from the strata means are 


e n a mostly positive. Thus the cross-product 
terms in Pws are predominantly positive. From theorem 8.3 we expect systematic sampling 


to be less precise than stratified random sampling with one unit per stratum. 
The variance V(j,,) is found directly from the systematic sample totals as 


f bari A3 the Brent A 
Vay) = Vay =y È (J= P= ag 2 (ny ny? 


i= 


1 o ae 
= b[ or+(68)+ + (88° |= 11.63 


Stratified random sampling, we need an analysis of variance of the 
“between rows” and “within rows.” This is presented in Table 8.4. hence 


For random and 
Population into 
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TABLE 8.3 
DATA FOR 10 SYSTEMATIC SAMPLES WITH n = 4, N = kn = 40 


Systematic sample numbers Strata 
Strata D> B23 ea ES TO T RS 9) LO means 
I Og E a ie T 4.1 
II 6p Sie 9 E 3 il? Se a 17, 12:2 
m 18 19 20 20 ¿24 23 25.28 29 .27 23.3 
IV 26 430 31 (31 £33 932 235 437,938,538 33.1 
Totals 50 58 61 63 75 71 82 88 91 88 257) 
TABLE 8.4 
ANALYSIS OF VARIANCE 
df ss ms 
Between rows (strata) 3 4828.3 
Within strata 36 485.5 13.49 = $2. 
Totals 39 5313.8 136.25 = S$? 


the variances of the estimated means from simple random and stratified random samples 
are as follows. 


kr NRSO 136.25 _ 
Va (a ADH OWaNTAMON T hS. 
NM) Seu 9 13.49 
= jot. 3.04 
Vu ES n 10 4 


Both stratified random sampling and systematic sampling are much more 
effective than simple random sampling but, as anticipated, systematic sampling is 
less precise than stratified random sampling. 

Table 8.5 shows the same data; with the order of the observations reversed in 
the second and fourth strata. This has the effect of making pws negative, because it 
makes the majority of the cross products between deviations from the strata 
means negative for pairs of observations that lie in the same systematic sample. In 
the first systematic sample, for instance, the deviations from the strata means are 
now —4.1, +4.8, —5.3, +4.9. Of the six products of pairs of deviations, four are 
negative. Roughly the same situation applies in every systematic sample. 

This change does not affect V,,,, and V,,. With systematic sampling, it brings 
about a dramatic increase in precision, as is seen when the systematic sample totals 
in Table 8.5 are compared with those in Table 8.3. We now have 


(727)? 


10 ]=0.46 


=! (73724. (74)? 4: -+ (65)- 
Vo = 76 =| (73) +(74)?+- - -+ (65) 
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TABLE 8.5 
DATA IN TABLE 8.3, WITH THE ORDER REVERSED IN STRATA II AND IV 

Systematic sample numbers Strata 
Strata D258 eS Ge) ST n10 means 
I onmi A A a E A 4.1 

I ie On Lon S e213) 10pm Olas a6, 12.2 
u i 19 20 20 24 23 25 28 29 27 23.3 
IV 38238137 13S M3233 S 30 26 33.1 
Totals 73: 74 74 72 73 73 73 75 75 65 72.7 


It is sometimes possible to exploit this result by numbering the units to create 
negative correlations within strata. Accurate knowledge of the trends within the 
population is required. However, as will be seen later, the situation in Table 8.5 is 


one in which it is difficult to obtain from the sample a good estimate of the 
standard error of Vii 


8.4 COMPARISON OF SYSTEMATIC WITH STRATIFIED 
RANDOM SAMPLING 


The performance of systematic sampling in relation to that of stratified or 
simple random sampling is greatly dependent on the properties of the population. 
There are populations for which systematic sampling is extremely precise and 
others for which it it is less Precise than simple random sampling. For some 
populations and some values of n, V(¥,y) may even increase when a larger sample 
is taken—a startling departure from good behavior. Thus it is difficult to give 
general advice about the situations in which systematic sampling is to be recom- 
mended. A knowledge of the structure of the population is necessary for its most 
effective use. 

Two lines of research on this problem have been followed. One is to compare 
the different types of sampling on artificial Populations in which y; is some simple 
function of i. The other is to make the comparisons for natural Populations. Some 
of the principal results are presented in the succeeding sections. 


8.5 POPULATIONS IN “RANDOM” ORDER 


Systematic sampling is sometimes used, for its convenience, in populations in 
which the numbering of the units is effectively random. This is soin sampling from 
a file arranged aiphabetically by surnames, if the item that is being measured has 
no relation to the surname of the individual. There will then be no trend or 
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stratification in y; as we proceed along the file and no correlation between 
neighboring values. 

In this situation we would expect systematic sampling to be essentially equival- 
ent to simple random sampling and to have the same variance. For any single finite 
population, with given values of n and k, this is not exactly true, because V,,, 
which is based on only k degrees of freedom, is rather erratic when k is small and 
may turn out to be either greater or smaller than V,,,,. There are two results which 
show that on the average the two variances are equal. 


Theorem 8.4. Consider all N! finite populations that are formed by the N! 
permutations of any set of numbers y4, y2,..., Yn- Then, on the average over 
these finite populations, 


E(V.y) = Vian (8.16) 


Note that V,,, is the same for all permutations. 

This result, proved by W. G. and L. H. Madow (1944), shows that if the order of 
the items in a specific finite population can be regarded as drawn at random from 
the N! permutations. systematic sampling is on the average equivalent to simple 
random sampling. 

The second approach is to regard the finite population as drawn at random from 
an infinite superpopulation which has certain properties. The result that is proved 
does not apply to any single finite population (i.e., to any specific set of values 
Yo Yz +++, yn) but to the average of all finite populations that can be drawn from 
the infinite population. 

The symbol € denotes averages over all finite populations that can be drawn 


from this superpopulation. 


Theorem 8.5. If the variates y; (i = 1, 2, . . . ; N) are drawn at random from a 
superpopulation in which 
By =u, Ey wy u)=0 (i#j) Eyza) =o 
Then 
EV 5 = EV san 


The crucial conditions are that all y; have the same mean x, that is, there is no 

trend, and that no linear correlation exists between the values y; and y; at two 

different points. The variance g; may change from point to point in the series. 
Proof. For any specific finite population, 


N A2 
N-ra n 


Van = 
Nn N-1 
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Š Q- Y= Elo, —p)-(¥-p)P 
i=1 i= 


N vA 2 
= 2 hy -N(¥-p) 


Since y; and y; are uncorrelated (i ¥/), 


EP-u Lo? (8.17) 
Hence 
Van = WaN 2 NEL) (8.18) 
This gives 
EV an = = Lot (8.19) 


Turning to V,,y, let ¥,, denote the mean of the uth systematic sample. For any 
specific finite population, 


JPE = 
Vy =g L Ou- Y? (8.20) 
Ul eae et 
=al È O-n?) (8.21) 


By the theorem for the varia 


nce of the mean of an uncorrelated sample from an 
infinite population, 


N 
1 k Yo? 
i=1 = 
Aon e (8.22) 
NaN 
Nn za = BV ran (8.23) 


8.6 POPULATIONS WITH LINEAR TREND 


If the population consists solely of a linear trend, as illustrated in Fig. 8.2, it is 
fairly easy to guess the nature of the results. From Fig. 8.2, it looks as if V,y and Vy 
(with one unit per stratum) will both be smaller than V,,,. Furthermore, V, will be 
larger than V,» for if the systematic sample is too low in one stratum it is too low in 
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X: 


x = systematıc sample 
o = stratified random sample 


+ 
Fig. 8.2 Systematic sampling in a population with linear trend. 


all strata, whereas stratified random sampling gives an opportunity for within- 
stratum errors to cancel. 
To examine the effects mathematically, we may assume that y; =i. We have 
N  N(N+1 x N(N+1)(2N+1 
E rE ANED Nay pa ANED D 
1 


i=1 i= 


The population variance S? is given by 
2 ee 2_ Nọ? 
S NE 1È yi -NY') 


1 pes 1)(2N+1)_N(N+ 27) _N(N+1) 


ANEI 6 4 12 (a20) 
Hence the variance of the mean of a simple random sample is 
- 2 - — 
Vie n S“_ nlk 1) N(N+1)_(k 1(N+1) (8:25) 


N on N 12n 12 


To find the variance within strata, Są", we need only replace N by k in (8.24). 
This gives 


Cane Tae a o Gay 
For systematic sampling, the mean of the second sample exceeds that of the first 
by 1; the mean of the third exceeds that of the second by 1, and so on. Thus the 
means jy, may be replaced by the numbers 1, 2,...,k. Hence, by a further 
application of (8.24), 
E k(k?—1) 


2 Oum P= aaa 
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This gives 
els eye e =! (8.27) 
Vay = Su) = 2 
From the formulas (8.25), (8.26), and (8.27) we deduce, as anticipated, 
k?—1 k?—1 (k—1)\(N+1) 
a A DS Van a Sa (8.28) 


Equality occurs only when n = 1. Thus, for removing the effect of a linear trend, 
suspected or unsuspected, the Systematic sample is much more effective than the 
simple random sample but less effective than the stratified random sample. 


8.7. METHODS FOR POPULATIONS WITH LINEAR TRENDS 


The performance of systematic sampling in the presence of a linear trend can be 
improved in several ways. One is to use a centrally located sample. Another is to 
change the estimate from an unweighted to a weighted mean in which all internal 
members of the sample have weight unity (before division by n) but different 
weights are given to the first and last members. If the random number drawn 
between 1 and k is i, these weights are 


n(2i-k~-1) 
pa aie) 
i 2(n—1)k (8:22) 


the + sign being used for the first member, the 
weights obviously add to 2. The reader may ve 
a linear trend and N = nk the weighted sampl: 
mean. The performance of these end correc 
(1948), to whom they are due. 

Bellhouse and Rao 
N #¥nk when the systematic sample is drawn b 


~ sign for the last. For any i, the two 
rify that if the population consists of 
e mean gives the correct population 
tions has been examined by Yates 


population. For example, if the Starting random number i 
19 with N= 23, n =5, units 19, 1, 6, 11, 16 constituting the sample, the first and 
last members are y, and Yig- Two cases arise. 

Case I. Small i for which i+(n—1)k £N, so that the n units are obtained 


without passing over yy. The weights for the first (+) and last (=) members are 
n[2i+(n—1)k-(N+1)] 
ee RS age A A (8.30) 


Case2. i+(n-1)k>N.Let nz be the number of sample units obtained after 
Passing over yy. Thus, with i = 19, n2= 4. The weights for the first (+) and last (-) 
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members are 


1 [2i+ n-ne- 2n] (8.31) 
n 


n 
*2(N—1) 
In both cases the internal sample members receive weight 1 in the sample total. 
With N= 23, n=k=5,i=19, n)=4, the first and last weights are 1 +(-7/18). 
Hence y, receives a weight 11/18, while y; receives 25/18. 

Two alternative methods attempt to change the method of sample selection so 
that the sample mean is unaffected by a linear trend. With N= nk and n even, a 
method suggested by Sethi (1965) divides the population into n/2 strata of size 2k, 
choosing two units equidistant from the end of each stratum. With starting 
random number i, the n/2 pairs of units are those numbered 


(i+2jk,247+1)k-i+1], j=0,1,2,...5n-1 (8.32) 


This selection removes the effect of a linear trend in any stratum of 2k units, 
even if the linear slope varies from stratum to stratum. Murthy (1967) has called 
the method balanced systematic sampling. 

The modified method of Singh et al. (1968) chooses pairs of units equidistant 
from the ends of the population. With n even, the n/2 equidistant pairs that start 
with unit i (i = t, 2, . .. , k) are 


[li+jk, (N-jk)-i+1), j=0,1,2,...4n-1 (8.33) 


With n odd in these methods, j goes up to 3(n—1)—1 in (8.32) and (8.33). The 
balanced method (8.32) adds the remaining sample member near the end at 
[i+ (n— 1)k]; the modified method near the middle at [i +4(n — 1)k]. The effect of 
a linear trend is not completely eliminated in y for n odd. 

Comparisons of the performances of these two methods with Yates’ corrections 
and with ordinary systematic sampling have been made on superpopulation 
models representing linear and parabolic trends, periodic and autocorrelated 
variation (Bellhouse and Rao, 1975), and on a few small natural populations by 
these authors and by Singh (personal communication). In general the three 
methods (Yates, balanced, modified) performed similarly, being superior to 
ordinary systematic sampling in the presence of a linear or parabolic trend. 

The population in Table 8.3, p. 211, for example, is one on which these methods 
should perform very well. Ordinary systematic sampling gave Vy = 11.63. Com- 
parable variances for the other methods (n =4, k = 10) are: Yates, 1.29; Sethi 
(balanced), 0.46; Singh (modified), 0.34. The balanced method happens to be that 
obtained in Table 8.5 by reversal of strata II and IV in Table 8.3. 


8.8 POPULATIONS WITH PERIODIC VARIATION 


If the population consists of a periodic trend, for example, a simple sine curve, 
the effectiveness of the systematic sample depends on the value of k. This may be 
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seen pictorially in Fig. 8.3. In this representation the height of the curve is a 
observation y; The sample points A represent the case least favorable to t e 
systematic sample. This case holds whenever k is equal to the period of the sine 
curve or is an integral multiple of the period. Every observation within the 
systematic sample is exactly the same, so that the sample is no more precise than a 
single observation taken at random from the population, 


A A Ax 


Fig. 8.3 Periodic variation. 


The most favorable case (sample B) occurs when k is an odd multiple of the 
half-period. Every systematic sample has a mean exactly equal to the true 
population mean, since successive deviations aboye and below the middle line 
cancel. The sampling variance of the mean is therefore zero, Between these two 
cases the sample has various degrees of effectiveness, depending on the relation 
between k and the wavelength. 

Populations that exhibit an exact sine curve are not likely to be encountered in 
practice. Populations with a more or less definite periodic trend are, however, not 
uncommon. Examples are the flow of road traffic Past a point on a road over 24 
hours of the day and store sales over seven days of the week. For estimating an 
average over a time period, a systematic sample daily at 4 p,m. or every Tuesday 
would obviously be unwise. Instead, the strategy is to stagger the sample over the 
periodic curve, for example, by seeing that every weekday is equally represented 
in the case of store sales. 

Some populations have a kind of periodic effect that is less obvious. A series of 
weekly payrolls in a small sector of a factory may always list the workers in the 
same order and may contain between 19 and 23 names every week, A systematic 
sample of 1 in 20 names over a period of weeks might consist mainly of the records 
of one worker or of the records of two or three workers, Similarly, a systematic 
sample of names from a city directory might contain too many heads of house- 
holds, or too many children. If there is time to study the periodic structure, a 
systematic sample can usually be designed to capitalize on it. Failing this, a simple 
or stratified random sample is preferable when a periodic effect is suspected but 
not well known, 

In some natural populations quasi 
be difficult to anticipate. L. H. Mad 


a bed of hardwood seedling stock i 
(1950) di 
Dun fore: 


periodic variation may be present that would 
low (1946) found evidence pointing this way in 
n a rather small population (N = 420). Finney 
Scussed a similar phenomenon in timber volume per strip in the Dehra 
St, although in a reexamination of the data Milne (1959) suggested that 
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the apparent periodicity might have been produced by the process of measure- 
ment. The effect of quasiperiodicity is that systematic sampling performs poorly at 
some values of n and particularly well for others. Whether this effect occurs 
frequently is not known. Matérn (1960) cites examples in which natural forces 
(e.g., tides) might produce a spatial periodic variation, but he is of the opinion that 
no clear case has been found in forest surveys. 


8.9 AUTOCORRELATED POPULATIONS 


With many natural populations, there is reason to expect that two observations 
yi, yj Will be more nearly alike when i and j are close together in the series than 
when they are distant. This happens whenever natural forces induce a slow change 
as we proceed along the series. In a mathematical model for this effect we may 
suppose that y; and y; are positively correlated, the correlation between them 
being a function solely of their distance apart, i — j, and diminishing as this distance 
increases. Although this model is oversimplified, it may represent one of the 
salient features of many natural populations. 

In order to investigate whether this model does apply to a population, we can 
calculate the set of correlations p,, for pairs of items that are u units apart and plot 
this correlation against u. This curve, or the function it represents, is called a 
correlogram. Even if the model is valid, the correlogram will not be a smooth 
function for any finite population because irregularities are introduced by the 
finite nature of the population, In a comparison of systematic with stratified 
random sampling for this model these irregularities make it difficult to derive 
results for any single finite population. The comparison can be made over the 
average of a whole series of finite populations, which are drawn at random from an 
infinite superpopulation to which the model applies. This technique has already 
been applied in theorem 8.5 and in sections 6.7, 7.8. 

Thus we assume that the observations y; (i=1,2,...,.N) are drawn from a 
superpopulation in which 


By)=4, El-u) =, Ely-wYisu-w) =o? —(8.34) 
where 
Pu =p, = 0, whenever u < v 


The drawing of one set of y; from this superpopulation creates a single finite 
population of size N. 


The average variance for systematic sampling is denoted by 
EV, = EE (Jy — Ý) 
For this class of populations it is easy to show that stratified random sampling is 


superior to simple random sampling, but no general result can be established 
about systematic sampling. Within the class there are superpopulations in which 
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systematic sampling is superior to stratified random sampling, but there are also 


superpopulations in which systematic sampling is inferior to simple random 
sampling for certain values of k. 


A general theorem can be obtained if it is further assumed that the correlogram 
is concave upwards. 


Theorem 8.6. If, in addition to conditions (8.34), we have’ 
Ôa = Pusit+Pu-1—2pu20 [u=2,3,...,(kn-2)] (8.35) 
then 
EV. = EVa = EV ian (8.36) 
for any size of sample. Furthermore, unless 6,7=0, u=2,3,..., (kn —2), 
i EViy <EV,, (8.37) 
A proof has been given by Cochran (1946). 


A sketch of the argument for n = 2 illustrates the role played by the “concave 
upwards” condition. In the systematic sample the members of the pair are always 
k units apart. Hence 


EV(¥,,) => +o +2p,.07) =40°(1 +p,) (8.38) 


With the stratified sample, there are k possible positions for the unit drawn from 


each stratum, making k? combinations of positions. The numbers of combinations 
1,2,...(2k-1) units apart are as follows. 


Distance | 1 2...(k-1) k (k+1)...(2k-1) 


Total 
Number | NA O k (k-1)... 1 k? 
Hence the average value of V(y,,), taken over the k? combinations, may be 
written 
X o? [kz 
EVO = 9] 3 up, tona) + K-49) (8.39) 
Similarly, €V(¥,,) may be expressed as 
27 k=1 
#05) =] I u(2+2p,)-+k(1 +p] (8.40) 
hence 
A r o? [k 
EVG) -EVO =al È ulpe ton-u-20)] 64D 
But if 


Pusi t Pu-1 = 2p, (u=2,3,...) 
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it is easy to show that every term inside the brackets is positive. This completes 
the proof. In short, the average distance apart is k for both the systematic 
and the stratified sample, but because of the concavity the stratified sample 
loses more in precision when the distance is less than & than it gains when the 
distance exceeds k. 

Quenouille (1949) has shown that the inequalities in theorem 8.6 remain valid 
when two of the conditions are relaxed so that 


E= u Eivi =r (8.42) 


In this event each of the three average variances is increased by the same amount. 

As far as practical applications are concerned, correlograms that are concave 
upward have been proposed by several writers as models for specific natural 
populations. The function p, = tanh (u~*/*) was suggested by Fisher and Macken- 
zie (1922) for the correlation between the weekly rainfall at two weather stations 
that are a distance u apart; the function p, = e ““ by Osborne (1942) and Matérn 
(1947) for forestry and land use surveys; and the function p, = (/—u)/! by Wold 
(1938) for certain types of economic time series. 


8.10 NATURAL POPULATIONS 


Investigations have been made on a variety of natural populations. The data are 
described in Table 8.6. The first three studies were made from maps. In the first 
study the finite population consists of 288 altitudes at successive distances of 0.1 
mile in indulating country. In the next two the data are the fractions of the lengths 
of lines drawn on a cover-type map that lie in a certain type of cover (e.g., grass). 
These examples might be considered the closest to continuous variation in the 
mathematical sense. 

The next three studies are based on temperatures for 192 consecutive days: (a) 
12 in. under the soil, (b) 4in. under the soil, (c) in air. This trio represents a 
gradation in the direction of greater influence of erratic day-to-day changes in the 
weather compared with slow seasonal influences. 

The remaining studies deal with plant or tree yields in sequences that lie along a 
line. In the study on potatoes, which is typical of the group, the finite population 
consists of the total yields of 96 rows in a field. Since no exhaustive search of the 
literature has been made, further data may be available. 

Ji some of the studies V,, is compared with the variance V, for a stratified 
random sample with strata of size 2k and two units per stratum. This comparison 
is of interest because an unbiased estimate of V, can be obtained from the sample 
data. This cannot be done for V,,; (with strata of size k and 1 unit per stratum) or 
for V,,. Other writers report comparisons of V, with both V,,, and V,,. The 
majority of the sources do not present comparisons with V,,, in readily usable 
form, but it appears that in general V., gave gains in precision over V, 


ran: 
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TABLE 8.6 
NATURAL POPULATIONS USED IN STUDIES OF SYSTEMATIC SAMPLING 
Reference N Type of Data 

Yates (1948), 288 Altitudes read at intervals of 0.1 mile from 

table 13 ordnance survey map. 

Osborne (1942) * Percent of area in (a) cultivated land, (6) shrub, 
(c) grass, (d) woodland on parallel lines drawn 
on a cover-type map. 

Osborne (1942) = Per cent of area in Douglas fir on parallel lines 
drawn on a cover-type map. 

Yates (1948) 192 Soil temperature (12 in. under grass) for 192 con- 
secutive days, 

Yates (1948) 192 Soil temperature (4 in. under bare soil) for 192 days, 

Yates (1948) 192 Air temperature for 192 days. 

Yates (1948) 96 Yields of 96 rows of potatoes. 

Finney (1948) 160 Volume of saleable timber per strip, 3 chains 
wide and of varying length (Mt. Stuart forest), 

Finney (1948) 288 Volume of virgin timber per strip, 2.5 chains wide, 
80 chains long (Black’s Mountain forest). 

Finney (1950) 292 


Volume of timber Per strip, 2 chains wide and of 
varying length (Dehra Dun forest). 

Number of Seedlings per 1-ft-bed-width in 4 beds 
of hardwood seedbed stock. 

Number of seedlings per 1-ft-bed-width in 3 beds 
of coniferous seedbed stock. 

Number of seedlings per 1-ft-bed-width in 6 beds 
of coniferous transplant stock. 


Johnsen (1943) 400t 
Johnson (1943) 400t 


Johnson (1943) 400t 


* Theoretically, N is infinite, if lines that are infinitely thin can be envisaged. 
1 Approximately. The number varied from bed to bed. 


Precision which, although modest, is worth ha 


Viri/ Vay is 1.4. The gains in comparison with V, are substantial, the median ratio 
being 1.9. 
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TABLE 8.7 
RELATIVE PRECISION OF SYSTEMATIC AND STRATIFIED RANDOM SAMPLING 


Relative Prevision of 
Systematic to Stratified 


Range 


Data ofk Val Vy © Vaf Voy 
Altitudes ` 2-20 2.99 5.68 
Per cent area (4 cover types) — 4.42 
Per cent area (Douglas fir) — 1.83 
Soil temperature (12 in.) 2-24 2.42 4.23 
Soil temperature (4 in.) 4-24 1.45 2.07 
Air temperature 4-24 1.26 1.65 
Potatoes 3-16 1.37 1.90 
Timber volume (Mt. Stuart) 2-32 1.07 135) 
Timber volume (Black's Mt.) 2-24 1.19 1.44 
Timber volume (Dehra Dun) 2-32 1.39 1.89 
Hardwood seedlings 14 — 1.89 
Coniferous seedlings 14-24 — 2.22 
Coniferous transplant 12-22 = 0.93 


SSS SS SS eee 


The internal trend of the results agrees with expectations, although not too 
much should be made of this in view of the small number of studies. The gains are 
largest for the types of data in which we would guess that variation would be 
nearest to continuous. The decline in V,,/ V, from soil to air temperatures would 
also be anticipated from this viewpoint. In the last three items (forest nursery 
data), the only one showing no gain is coniferous transplant stock, which is older 
and more uniform than seedling stock. 


8.11 ESTIMATION OF THE VARIANCE FROM A 
SINGLE SAMPLE 


From the results of a simple random sample with n> 1, we can calculate an 
unbiased estimate of the variance of the sample mean, thejestimate being 
unbiased whatever the form of the population. Since a systematic sample can he 
regarded as a simple random sample with n = 1, this useful property gossot ie 
for the systematic sample. As an illustration, consider the “sine curve” example. 
Let 

ami, 
y=m+asin 2 
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where k =4 and i=1,2,...,4n. The successive observations in the population 
are 


(m+a), m,(m—a),m,(m+a), m,(m—a),m,... 


If i= 1 is chosen as the first member, all members of the systematic sample have 
the value (m + a). For the other three possible choices of i, all members have the 
values m, (m — a), or m, respectively. Thus from a single sample we have no means 
of estimating the value of a. But the true sampling variance of the mean of the 
systematic sample is a°/2. The illustration shows that it is impossible to construct 
an estimated variance that is unbiased if periodic variation is present. 

These results do not mean that nothing can be done. Excluding the case of 
periodic variation, we might know enough about the structure of the population to 
be able to develop a mathematical model that adequately represents the type of 
variation present. We might then be able to manufacture a formula for the 
estimated variance that is approximately unbiased for this model, although it may 
be badly biased for other models. The decision to use one of these models must 
rest on the judgment of the sampler. 

Some simple models with their corresponding estimated variances are pre- 
sented below. No proofs are given. 


The simplest models apply to populations in which yi is composed of a trend 
plus a “random” component. Thus 


y 


where w; is some function of i. For the random com 
a superpopulation in which 


Hite; 


ponent, we assume that there is 


&(e)=0, (e)=07, Elee)=0 (ij) 
A proposed formula Ss for the estimated variance is called unbiased if 


$E(s,y7) = EV 


that is, if it is unbiased over all fi 


: nite populations that can be drawn from the 
superpopulation. 


Population in “Random” Order 
Hi = constant (i=1,2 weal) 
2 _N=n¥ y-n) 
SO pep RAD) 


This case applies when we are confident that the ord 
respect to the items being measured. The variance fo 
simple random sample and is unbiased if the model 


er is essentially random with 
tmula is the same as that for a 
is correct. 
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Stratification Effects Only 
H; constant (rk+1Si=rk+k) - 


s2 Non E Yi Yi) 
22. Nn | 2(n=1) 


In this case the mean is constant within each stratum of k units. The estimate Sey 
which is based on the mean square successive difference, is not unbiased. It 
contains an unwanted contribution from the difference between »’s in neighbor- 
ing strata, and the first and last strata carry too little weight in estimating the 
random component of the variance. With a reasonably large sample, this estimate 
would in general be too high, assuming that the model is correct., 


(8.44) 


Linear Trend 


4i=u +Bi 
2 Nonn DOi Yiee tyez) 5 
Ssy3 Nee 6(n—2) (lsisn-2) (8.45) 


The estimate is based on successive quadratic terms in the sequence y;. The sum of 
squares contains (n — 2) terms. With a linear trend we have seen (section 8.7) that 
the trend can be eliminated by the use of end corrections. The term n'/n? is the 
sum of squares of the weights in Jwsy: Unless n is small, n'/n? can be replaced by 
the usual factor 1/n. Because the strata at the ends receive too little weight, the 
estimate is biased unless ø/ is constant, but it should be satisfactory if n is large 
and the model is correct. 

If continuous variation of a more complex type is present, the preceding 
formulas may give poor results. In Table 8.8 the second and third formulas are 
applied to six forest nursery beds (Johnson, 1943). The quadratic formula is 
slightly better than that based on successive differences, but both give consistently 
serious overestimates. 


TABLE 8.8 
VARIANCES OF SAMPLE MEAN NUMBERS OF SEEDLINGS (JOHNSON’S DATA) 
Actual 
Bed Voy Siva Siva 
Silver maple 1 0.91 2:8 2:5 
2 0.74 3.6 2.9 
American elm 1 4.8 28.4 12.6 
2 15.5 22.6 18.6 
White spruce 1 5.5 17.2 11.2 
2 2.0 11.6 6.4 


White pine 8.2 21.0 21.9 
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Formulas developed from simple -assumptions about the nature of the 
correlogram have been given by Osborne (1942), Cochran (1946), and Matern 
(1947). If the successive observations in the systematic sample are denoted by y,’, 


y2, and so forth, Yates (1949) suggested an estimator based on differences d,, of 
which the first is 


di= Gyi tys +ys +7 +3yo')— (yo! + Ya + ye tye’) (8.46) 


The next difference, dz, may start with yo’, and so on. For the estimated variance of 
Ysy We take 


Nn SE (8.47) 


The factor 7.5 is the sum of squares of the coefficients in any d,, and g is the 
number of differences that the sample provides (g = n/9). In natural populations 
that Yates examined, a formula of this type was superior to Sea based on 
successive differences but still overestimated V(y,,). 

In summary, there is no dearth of formulas for the estimated variance, but all i 
appear to have limited applicability. 

With N = nk, suppose that n is divisible by an integer m (say 10). The following. ʻ 
method uses systematic sampling in part and provides an unbiased sample 
estimate of V(Fsy) based on (m — 1) degrees of freedom. Draw a simple random 
sample of size m from the units numbered 1 to mk. For every unit in this sample, 
take also every (mk)th thereafter. In effect, this method divides the population 
into mk clusters each of size N/mk =n/m, and chooses m clusters at random, so 
that we have a simple random sample of m clusters. For instance, suppose that we 
wanta 20% sample froma Population with N = 2400, so that n = 480, k = 5. Take 
a simple random sample of size m = 10 from units numbered 1 to mk = 50, and 
take every 50th unit thereafter. We then have 10 cluster samples each of size 
2400/50 = 48. 

Gautschi (1957) has examined the accuracy of this method under the popula- 
tion structures considered in this chapter. As might be anticipated, the accuracy 


lies between those of simple random Sampling and of systematic sampling with 
m=1. 


8.12 STRATIFIED SYSTEMATIC SAMPLING 
This is suitable if separate 


um or if unequal sampling fractions are to be 
e more precise than stratified random sampling 
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if systematic sampling within strata is more precise than simple random sampling 
within strata. 

If Fyn- is the mean of the systematic sample in stratum h, the estimate of the 
population mean Y and its variance are 


Vsisy =È WrFsyns — Wsusy) =D Wi? Vaya) 


With only a few strata, the problem of finding a sample estimate of this quantity 
amounts to that already discussed of finding a satisfactory sample estimate of 
V(Fsyn) in each stratum. 

When the strata are more numerous, an estimate based on the method of 
“collapsed strata” (section 5A.12) may be preferable. From the results in that 
section, it follows.that the estimate ‘ 

0 (Fsisy) =’ Wr (Vay Fey) (8.48) 
where the sum extends over the pairs of strata, is on the average an overestimate, 
even if periodic variation is present within the strata. 

An unbiased estimate of the error variance can be obtained if two systematic 
samples, with a different random start and an interval 2k, are drawn within each 
stratum, one df being provided by each stratum. There will be some loss in 
precision if systematic sampling is effective. If there are many strata, one systema- 
tic sample can be used in most of them, drawing two in a random subsample of 
strata for the purposes of estimating the error. 


8.13 SYSTEMATIC SAMPLING IN TWO DIMENSIONS 


In sampling an area, the simplest extension of the one-dimensional systematic 
sample is the “square grid” pattern shown in Fig. 8.4a. The sample is completely 
determined by the choice of a pair of random numbers to fix the coordinates of the 
upper left unit. The performance of the square grid has, been studied both on 
theoretical and natural populations. Matérn (1960) has investigated the best type 
of sample when the correlation between any two points in the area is a monotone 
decreasing concave upward function of their distance apart d. For correlograms 
like e~*4 the square grid does well, being superior to simple or stratified random 
sampling with one unit per stratum, although Matérn gives reasons for expecting 
that the best pattern for this situation is a triangular network in which the points lie 
at the vertices of equilateral triangles. š 

In 14 agricultural uniformity trials, Haynes (1948) found that ihe square grid 
had about the same precision as simple random sampling in two dimensions. 
Milne (1959) examined the central square grid, in which the point lies at the center 
of the square, in 50 uniformity trials. It performed better than simple random 
sampling and perhaps slightly better than stratified random sampling; although 
this difference was not statistically significant. These results suggest that, at least 
for data of this type, autocorrelation effects are weak. For estimating the area 
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covered by forest or by water on a map, Matérn found the square grid superior to 
the random method in two examples. : i 
Figure 8.4b shows an alternative systematic sample, called an unaligned 
sample. The coordinates of the upper left unit are selected first by a pair of random 
numbers. Two additional random numbers determine the horizontal coordinates 


(a) Aligned or “square grid” (b) Unaligned sample 
sample 


Fig.8.4 Two types of two-dimensional systematic sample. 


of the remaining two units in the first column of strata. Another two are needed to 
fix the vertical coordinates of the remaining units in the first row of strata. The 
constant interval k (equal to the sides of the squares) then fixes the locations of all 
points. Investigations by Quenouille (1949) and Das (1950) for simple two- 
dimensional correlograms indicate that the unaligned pattern will often be 
superior both to the square grid and to stratified random sampling. 

Further evidence of the superiority of an unaligned sample is obtained from 
experience in experimental design, in which the latin square has been found a 
precise method for arranging treatments in a rectangular field. The 5x5 latin 
square in Fig. 8.5a may be regarded as a division of the field into five systematic 
samples, one for each letter. There is some evidence that this particular square, 
which is called the “‘knight’s move” latin square, is slightly more precise than a 
randomly chosen 5X5 square, probably because alignment is absent in the 
diagonals as well as in rows and columns. 

The principle of the latin square has been used by Homeyer and Black (1946) in 
sampling rectangular fields of oats. Each field contained 21 plots. The three 
possible systematic samples are denoted by the letters A, B, and C, respectively, in 
Fig. 8.55. This arrangement, with one of the letters chosen at random in each field, 
gave an increase in precision of around 25% over stratified random sampling with 
Tows as strata. The arrangement does not quite satisfy the latin square property 
because each letter appears three times in one column and twice in the other 
columns, but it approaches this property as nearly as possible. 

Yates (1960), who terms arrangements of this type lattice sampling, discusses 
their use in two- and three-dimensional sampling, In three dimensions each row, 
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ABYC) DS, ABC 
DWE: TAP BRIG BCA 
BC SDE A CAB 
ERVANEB = GC OD} A -BEC 
CDE TASB BCA 

CRAB 

Zh ELE: 

(a) “Knight's move” latin square (6) Systematic design for a 3 x 7 rec- 


tangular field 


Fig.8.5 Two systematic designs based on the latin square. 


column, and vertical level can be represented in the sample by choosing p units out 
of the p° in the population. With p° units in the sample, each of the p? 
combinations of levels of rows and columns, of rows and vertical heights, and of 
columns and vertical heights can be represented. Patterson (1954) has investi- 
gated the arrangements that provide an unbiased estimate of error. 


8.14 SUMMARY 


Systematic samples are convenient to draw and to execute. In most of the 
studies reported in this chapter, both on artificial and on natural populations, they 
compared favorably in precision with stratified random samples. Their disadvan- 
tages are that they may give poor precision when unsuspected periodicity is 
present and that no trustworthy method for estimating V(j,,) from the sample 
data is known. 

In the light of these results systematic sampling can safely be recommended in 


the following situations. 


1. Where the ordering of the population is essentially random or contains at 
most a mild stratification. Here systematic sampling is used for convenience, with 
little expectation of a gain in precision. Sample estimates of error that are 
reasonably unbiased are available (section 8.11). 

2. Where a stratification with numerous strata is employed and an independent 
systematic sample is drawn from each stratum. The effects of hidden periodicities 
tend to cancel out in this situation, and an estimate of error that is known to be an 
overestimate can be obtained (section 8.12). Alternatively, we can use half the 
number of strata and draw two systematic samples, with independent random 
starts, from each stratum. This method gives an unbiased estimate of error. 

3. For subsampling cluster units (Chapter 10). In this case an unbiased or 
almost unbiased estimate of the sampling error can be obtained in most practical 
situations. This is a common use of systematic sampling. 

4. For sampling populations with variation of a continuous type, provided that 
an estimate of the sampling error is not regularly required. If a series of surveys of 
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this type is being made, an occasional check on the sampling errors may be 
sufficient. Yates (1948) has shown how this may be done by taking supplementary 
observations. 


EXERCISES 


8.1 The data in the table (p.230) are the numbers of seedlings for each foot of bed in a 
bed 200 ft long. 

Find the variance of the mean of a systematic sample consisting of every twentieth foot. 
Compare this with the variances for (a) a simple random sample, (b) a stratified random 
sample with two units per stratum, (c) a stratified random sample with one unit per stratum. 
All samples have n = 10. [X (y,— Y)?=23,601.] 

8.2 A population of 360 households (numbered 1 to 360) in Baltimore is arranged 
alphabetically in a file by the surname of the head of the household. Households in which 
the head is nonwhite occur at the following numbers: 28, 31-33, 36-41, 44, 45, 47, 55, 56, 
58, 68, 69, 82, 83, 85, 86, 89-94, 98, 99, 101, 107-110, 114, 154, 156, 178, 223, 224, 296, 
298-300, 302-304, 306-323, 325-331, 333, 335-339, 341, 342. (The nonwhite house- 
holds show some ‘“‘clumping”’ because of an association between surname and color.) 

Compare the precision of a 1-in-8 systematic sample with a simple random sample of the 
same size for estimating the proportion of households in which the head is nonwhite. 

8.3 A neighborhood contains three compact communities, consisting, respectively, of 
people of Anglo-Saxon, Polish, and Italian descent. There is an up-to-date directory, In it 
the persons in a house are listed in the following order: husband, wife, children (by age), 
others. Houses are listed in order along streets, The average number of persons per house is 
five. È 

The choice is between a systematic sample of every fifth person in the directory and a 
20% simple random sample. For which of the following variables do you expect the 
systematic sample to be more precise? (a) Proportion of people of Polish descent, (b) 
proportion of males, (c) proportion of children. Give reasons. 

8.4 Ina directory of 13 houses on a street the persons are listed as follows: M = male 
adult, F = female adult, m = male child, f = female child. 


Household 
1 2 3 4 5 6 7 8 9 IORA aS, 
M M WE AEE AVE Sele DIVE M M M M M M 
F E 18 F E F 1m F Ja F F E P 
Lif Us m Doe Ip f m m mes if if 
ma T: rn itt Ti fom 
f if. m 


Compare the variances given by a systematic sample of one in five persons and a 20% 
simple random sample for estimating (a) the proportion of males, (b) the proportion of 
children, (c) the proportion of persons living in professional households (households 1, 2, 3, 
12, and 13 are described as professional). Do the results support your answers to exercise 
8.3? For the systematic sample, number down each column, then go to the top of the next 


column. 


8.5 In exerci 
a simple random sample, (b) 


se 8.1 we might estimate V(j,,) by (a) regarding each systematic sample as 
pretending that each 1-in-20 systematic sample is composed 
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of two 1-in-40 systematic samples with a separate random start. For each method, compare 
the average of the estimated variances with the actual variance of Y,- 
8.6 Ina population consisting of a linear trend (section 8.6) show that a systematic 


sample is less precise than a stratified random sample with strata of size 2k and two units 


per stratum if n>(4k+2)/(k +1). 
8.7 A two-dimensional population with a linear trend may be represented by the 
relation 
yy=ttj JTE 2 yonk) 


where y, is the value in the ith row and jth column. The population contains N*=n7k? 
units. 
A systematic square grid sample is selected by drawing at random two independent 


starting coordinates ip, jo, each between 1 and k. The sample, of size n?, contains all units 
whose coordinates are of the form 


io t+ yk, jo+ ôk 


where y, ô are any two integers between 0 and (n —1), inclusive. 


Show that the mean of this sample has the same precision as the mean of a simple random 
sample of size n°. 


8.8 If the comparison in exercise 8.7 were made for a three-dimensional population 
with linear trend, what result would you expect? 


8.9 In a population with y,=i* (i=1,2,..., 16), compare the values of E(Y- Y)? 


given by every kth systematic sampling and by the Yates, Sethi, and Singh et al. methods for 
n=4,k=4,N=16. 


-ad 
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Single-Stage Cluster Sampling: 
Clusters of Equal Sizes 


9.1 REASONS FOR CLUSTER SAMPLING 


Several references have been made in preceding chapters to surveys in which 
he sampling unit consists of a group or cluster of smaller units that we have called 
elements or subunits. There are two main reasons for the widespread application 
of cluster sampling. Although the first intention may be to use the elements as 
sampling units, it is found in many surveys that no reliable list of the elements in 
the population is available and that it would be prohibitively expensive to 
construct such a list. In many countries there are no complete and up-to-date lists 
of the people, the houses, or the farms in any large geographic region. From maps 
of the region, however, it can be divided into areal units such as blocks in the cities 
and segments of land with readily identifiable boundaries in the rural parts. In the 
United States these clusters are often chosen because they solve the problem of 
constructing a list of sampling units. 

Even when a list of individual houses is available, economic considerations may 
point to the choice of a larger cluster unit. For a given size of sample, a small unit 
usually gives more precise results than a large unit. For example, a simple random 
sample of 600 houses covers a town more evenly than 20 city blocks containing ar 
average of 30 houses apiece. But greater field costs are incurred in locating 600 
houses and in travel between them than in locating 20 blocks and visiting all the 
houses in these blocks. When cost is balanced against precision, the larger unit 
may prove superior. 

A rational choice between two types or sizes of unit may be made by the familiar 
principle of selecting the unit that gives the smaller variance for a given cost or the 
smaller cost for a prescribed variance. As in many practical decisions, there may 
be imponderable factors: one type of unit may have some special convenience or 
disadvantage that is difficult to include in a calculation of costs. In sampling a 
growing crop, some experiences suggest that a small unit may give biased 
estimates because of uncertainty about the exact boundaries of the unit. Homeyer 
and Black (1946) found that units 2 x 2 ft gave yields of oats about 8% higher than 
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units 3 x 3 ft, possibly because samplers tend to place boundary plants inside the 
unit when there is doubt. Sukhatme (1947) cities similar results for wheat and rice. 

This Chapter deals with the case in which every cluster unit contains the same 
number of elements or subunits. 


9.2 A SIMPLE RULE 


When the problem is to compare a few specific sizes or types of unit, the 
following result is helpful. 


Theorem 9.1. This applies to simple random sampling in which the fpc is 


negligible. The quantity to be estimated is the population total. For the uth type of 
unit, let 


M, = relative size of unit 
S,7 = variance among the unit totals 
C, =relative cost of measuring one.unit 


Then relative cost for specified variance or relative variance for specified cost 
XCC, S MF. 


Proof. Suppose V is the specified variance of the estimated population total, 
For the uth type of unit the estimate is N,,y,, with variance 


NZS. N28? 
Kiise tia (9.1) 


The cost of taking n, units is Curu = C,N?S,?/ V. Since N,M, = constant for 
different types of unit, the cost is proportional to CuSu /M,-. On the other hand, if 


the cost C is specified, nu = C/C, and, in (9.1), VC,S,?/M,2. This completes 
the proof. 


Corollary 1. If we define the relative net precision of a unit as inversely 
proportional to the variance obtained for fixed cost, theorem 9:1 may be stated as 


M. 2 
relative net precision oc S 3 (9,2) 


uu 


Corollary 2. In the analysis of variance, the variances for units of different 
sizes are often computed on what is called a common basis—usually that applic- 
able to the smallest unit. To put the variances on a common basis, the variance S,7 
among the totals of units of size M, is divided by M,. Let 

2 


Bee a Pee $ i i 
S,” = m_ Variance among unit totals (on a common basis) 
-> M, 


toe Lu 3 i 
Gi ore = relative cost of taking a given bulk of sample 


u 
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Then theorem 9.1 and corollary 1 may be stated as follows. 


A 
relative cost for equal precision als o GS (9.3) 
4 ne 1 
relative net precision GF (9.4) 
u Su 


If differences in the costs of taking the sample are ignored (i.e., if C,’ is 
constant), the relative net precision with the uth unit oc1 /S,'. Kish’s deff factors 
for the different units (section 4.11) are therefore proportional to the Se= 
Su [Mu ; 

Example. Johnson’s data (1941) for a bed of white pine seediings provide a simple 
example. The bed contained six rows, each 434 ft long. There are many ways in which the 


bed can be divided into sampling units. Data for four types of unit are shown in Table 9.1. - 
Since the bed was completely counted, the data are correct population values, 


TABLE 9.1 
DATA FOR Four TYPES OF SAMPLING UNIT 
Type of Unit 4 
1-ft 2-ft 1-ft 2-ft 
Preliminary Data TOW row bed bed 
a a ees 
M, = relative size of unit 1 2 6 12 
N, = number of units in pop. 2604 1302 434 217 
Sf-= pop. variance per unit 2.537 6.746 23.094 68.558 
Number of feet of row that can 
be counted in 15 min. 44 62 78 108 


The units were 


1 ft of a single row 
2 ft of a single row 
1 ft of the width of the bed 
2 ft of the width of the bed 


With the first two units it was assumed that sampling would be stratified by rows, so that 
the S,? represent variances within rows. Simple random sampling was assumed for the last 
two units. 2 

Since the principal cost is that of locating and counting the units, costs were estimated by 
a time study (last row of Table 9.1). With the larger units, a greater bulk of sample can be 
counted in 15 min, less time being spent in moving from one unit to another, 

The quantity to be estimated is the total number of seedlings in the bed. In the notation of 
theorem 9.1, Table 9.1 gives the value of M, and S,’. The relative values of C,, expressed as 
the time required to count one unit, are as follows. 
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1-ft 2-ft 1-ft 2-ft 


TOW Tow bed bed 
. . 2 12 
C, (in 15-min times) 4 a 7 708 


By theorem 9.1, corollary 1,the relative net precisions are worked out in Table 9.2. e 
The last line of Table 9.2 gives the relative precisions when that of the smallest uait is 
taken as 100: The 1-ft bed appears to be the best unit. 


1 TABLE 9.2 
RELATIVE NET PRECISIONS OF THE Four UNITS 
1-ft row. 2-ft row 1-ft bed l 2-ft bed 
ITEA 44 (4)(62) (36)(78) (144)(108) 
~ = 17. = 18.38 = 20.27 |____—_—. = 18.90 
CES.) (2.537 wa (2)(6.746) (6)(23.094) (12)(68.558) 
100 106 117 109 


The variances among units, expressed on a common basis, are also worth looking at. The 
values of $u” = S,?/M,, applicable to a single foot of row, are, respectively, 2.537, 3.373, 
3.849, 5.713. Note that these variances increase steadily with increasing size of unit. This 
result is commonly found (although exceptions may occur). Since the relative net precision 
1/C,'S,*, the cost of taking a given bulk of sample must decrease with the larger units if 
they are to prove economical. 


Theorem 9.1 and its corollaries remain valid for stratified sampling with 
Proportional allocation if all strata are of the same size and if S,?, S,” represent 
average variances within strata. This is so, under the conditions stated, because 
the variance of the estimated population total, ignoring the fpc, is N’S,2/n, and 
therefore assumes the same form as in simple random sampling. Theorem 9.1 
-does not hold for more complex types of sampling. 

The preceding results are intended merely as an illustration of the general 
procedure. Comparisons among units should always be made for the kind of 
sampling that is to be used in practice or, if this has not been decided, for the kinds 
that are under consideration. Changes in the method of sampling or of estimation 
will alter the relative net precisions of the different units. Even witha fixed method 
of sampling and estimation, relative net precisions vary with size of sample if the 
Cost is not a linear function of size or if the size is large enough so that the fpc must 
be taken into account. 

There is usually more than one item to consider. One approach is to fix the total 
Cost and work out the relative net precisions for each type of unit and each item. 
Unless one type is uniformly superior, some compromise decision is made, giving 
Principal weight to the most important items. 
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TABLE 9.3 


ESTIMATED STANDARD Errors (%) FOR Four Sizes of UNIT, WITH SIMPLE 
RANDOM SAMPLING 


Best 
Items S/4° S/2 S 2S Unit 
Number of swine 5.0 4.9 5.3 6.2 S/2 
Number of horses 3.4 313 93-336 4.2 5/2 
Number of sheep 17.4 15.7 14.9 14.3 2S, 
Number of chickens 3.0 3.0 3.3 3.8 S/4, S/2 
Number of eggs yesterday 5.7 5.2. 4.9 47 2S 
Number of cattle 4.7 4.6 Te DOLF N2 
Number of cows milked 3.7 3.6 3.8 4.4 $/2 
Number of gallons of milk 4.4 4.2 4.4 4.9 S/2 
Dairy products receipts 48) 3.2 5.4 6:0 * *S/2 
Number of farm acres 2.9 2.8 3.0 3.5 S/2 
Number of corn acres 3.7 3.5 3.8 4.4 $/2 
Number of oat acres 4.6 4.8 5.6 7.0 S/4 
Corn yield 1.6 1.7 2.0 2.5 S/4 
Oat yield 1.6 1.5 1.6 18  S/2 
Commercial feed expenditures 12.6. 13.6 16.7, 21.8 | S/4 
Total expenditures, operator 7.8 8.1 96 12:0  S/4 
Total receipts, operator 6.2 6.5 7.7 98 S/4 
Net cash income, operator 6.8 6.9 7.8 9.55 S/4 


In view of the numerous factors that influence the results, a study of optimum 
size of unit in an extensive survey is a large task. A good example for farm 
sampling is described by Jessen (1942). An excerpt from his results is given in 
Table 9.3. This compares four sizes of unit—a quarter-section, a half-section, a 
section, and a block consisting of two contiguous sections. The section is an area 1 
mile square, containing on the average slightly under four farms. In this compari- 
son the total field cost ($1000), the length of questionnaire (60 min to complete), 
and the travel cost (5 cents per mile) are all specified, because relative net 
precisions change if any of these variables is altered. Costs are at a 1939 level. 

The data in the table are the relative standard errors (in per cent) of the 
estimated means per farm for 18 items. No unit is best for all items. The 
half-section and the quarter-section are, however, superior to the larger units for 
all except two items, with little to choose between the half- and quarter-sections. 
The half-section would probably be preferred, because the problem of identifying 
the boundaries accurately is easier. 
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9.3 COMPARISONS OF PRECISION MADE FROM 
SURVEY DATA 


In the nursery seedling example the variances for the different types of unit 
were obtained from a complete count of the population. Except with small 
populations, however, it is seldom feasible to conduct a survey solely for the ` 
purpose of comparison. Information about the optimum unit is more usually 
procured as an ingenious by-product of a survey whose main purpose is to make 
estimates. 

Suppose that in a survey each unit can be divided into M smaller units. Instead 
of recording only the totals for each “large” unit in the sample, we record data 
separately for each of the M small units. A comparison can then be made of the 
precision of the large and small units. A simple random sample of size n will be 
assumed at first. 


The analysis of variance in Table 9.4 can be computed from the sample. 


TABLE 9.4 
ANALYSIS OF VARIANCE OF THE SAMPLE DATA (ON A SMALL-UNIT BASIS) 
df 


Between large units 
Between small units within large 


= 1)s,2 1s? 
Between small units in sample s= (n = Dsi? + nM — Isu’ 


nM —1 


The estimated variance of a large unit (on a small-unit basis) is 5,7. It might be 
thought that an appropriate estimate of the variance of a small unit would be the 
mean square between all small units in the sample; that is, 


_(n=1)s,?+n(M~1)s,? 
s e (9.5) 


This estimate, although often satisfactory, is slightly biased because the sample is 
not a simple random sample of small units, since these are sampled in contiguous 
groups of M units. 

An unbiased estimate is obtained from the sample by constructing an analysis of 
variance, as in Table 9.5, for the whole population, which contains N large units 
and NM small units. 


_ By its definition, the population variance among small units is given by the last 
line of the table, that is, 


2_(N~1)S,+N(M~1)s,2 
Sag NM te acy Oo) 
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TABLE 9.5 
ANALYSIS OF VARIANCE FOR THE WHOLE POPULATION (ON A SMALL-UNIT BASIS) 
df ms 
Between large units N-1 Sa? 
Between small units within 
large units N(M — 1) Su? 
Between small units in the NM —1 Sa (N = DS + NM = WS? 
population NM — 1 


With simple random sampling, sẹ in Table 9.4 is an unbiased estimate of Sy? (this 
follows from section 2.3). It may be shown easily that Sw is an unbiased estimate of 
Sẹ’. Hence an unbiased estimate of the variance S? among all small units in the 
population is 


a (N—1)5,?+N(M—1)s,7 


2 9.7. 
ŝ NMA (9.7) 

Clearly, this expression is almost the same as the simpler expression 
a s +(M-1)sw (9.8) 


M 


If n >50, (9.5) for s? also reduces to (9.8), so that s? is a satisfactory approxima- 
tion to $? for n >50. , 

The two estimates sẹ (for the large unit) and $? (for the small unit) are on a 
common basis and may be substituted in theorem 9.1, corollary 2. 

If the sample is large, the small units may be measured for a random subsample 
of the large units (say 100 out of 600). Alternatively, two small units, chosen at 
random from each large unit, might be measured. More than one size of small unit 
may be investigated simultaneously, provided that we take data that give an 
unbiased estimate of Sẹ’ for each small unit. 

With stratified sampling, the variances for the large and small units can be 
estimated by these methods separately in each stratum and then substituted in the 
appropriate formula for the variance of the estimate from a stratified sample. 


Example. The data come from a farm sample taken in North Carolina in 1942 in order 
to estimate farm employment (Finkner, Morgan, and Monroe, 1943). The method of 
drawing the sample was to locate points at random on the map and to choose as sampling 
units the three farms that were nearest to each point. This method is not recommended 
because a large farm has a greater chance of inclusion in the sample than a small farm, and 
an isolated farm has a greater chance than another in a densely farmed area. Any effects of 


this bias will be ignored. 
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TABLE 9.6 


SAMPLE ANALYSIS OF VARIANCE (NUMBER OF PAID WORKERS) 
(SINGLE-FARM BASIS) 


df ms 
Between units within strata 825 6.218 
Between farms within units 2768 2.918 
Between farms within strata 3593 3.676 


From the sample data for individual farms, the group of three farms can be compared 
with the individual farm as a sampling unit. The item chosen is the number of paid workers. 
The sample was stratified, the stratum being a group of townships similar in density of farm 


population and in ratio of cropland to farmland. Since the sampling fraction was 1.9%, the 
fpc can be ignored. 


5, NS? 
OA) 

h OM, 
The correct procedure is to compute N,?5,2/n, separately within each stratum for the two 
types of unit, using an analysis of variance and expression (9.8). We will use a simpler 
procedure as an approximation. 

The strata contained in general between 300 and 450 farms, and either two or three 
3-farm units were taken in each stratum to make the sampling approximately proportional. 
Assuming proportionality, that is, n,/N, =n/N, we may write 

oN Now 
V(¥)=—Y NS, =—S,? 
n n 
if we assume further that the S,? 
replaced by their average, 5,7, 


Estimates of S,? are obtained from the analysis of variance in Table 9.6, which is on a 
single-farm basis. 


For the group of three farms, the mean square §,,*= 
on a single-farm basis. For the individual farm, using ( 


do not vary greatly among strata, so that they may be 


6.218 serves as the estimate of §,? 
9.8), we have 


6.218 +2(2.918 
ES) olg 


By theorem 9.1, corollary 2, the two figures, 6.218 for the group of three farms and 4.018 
for the individual farm, indicate the relative v: 


e ariances obtained for a fixed total size of 
sample, The group of farms gives about two 


l } thirds the precision of the single farm. 
Consideration of costs would presumably make the result more favorable to the three-farm 
unit. 


9.4 VARIANCE IN TERMS OF INT! RACLUSTER CORRELATION 
Variance formulas are sometimes expressed in terms of the correlation coeffi- 


cient p between elements in the same cluster, This approach has already been used 
for systematic sampling (section 8.3). 
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Let yy be the observed value for the jth element within the ith unit, and let y; be 
the unit total. In cluster sampling we need to distinguish between two kinds of 
average: the mean per unit y=) yi/N and the mean per element Y=) y,/NM= 
Y/M. The variance among elements is 


x Oy om Ya 
2 iy 
S NM=1 
The intracluster correlation coefficient p was defined (section 8.3) as 
Y) 25 2 yy- Ý- Ý) 
Eo Y(vn=¥)_ Tiz A! i 6.9) 
Ey- Ý (M—1)\(NM=1)S° i 


The number of terms (cross products) in the numerator E is NM(M — 1)/2, and in 
the denominator E is (NM — 1)S*/ NM. 


Theorem 9.2. A simple random sample of n clusters, each containing M 
elements, is drawn from the N clusters in the population. Then the sample mean 
per element ¥ is an unbiased estimate of Y with variance 


vo- aS S*{1+(M-1)p] 
=f s21 + m- Dp] (9.10) 


where p is the intracluster correlation coefficient. 
Proof. Let y; denote the total for the ith cluster and y = Sy y;/n. By theorems 


2.1 and 2.2, f is an unbiased estimate of Y with variance 


re (i-f) Liz 
VO ae Zo 
But 7 = Mf and Y= MY. Hence ¥ is an unbiased estimate of Ý with variance 
palf Lo- (9.11) 
Vi) = nM? N-1 


But 
(yi- Ý = (ya +027 Pt +m- Y) 


Square and sum over all N clusters, 
N M 
Em- PES u- +2: ye (67a Ý) (yix = Y) 
i ij i j<i 
=(NM— 1)S?+(M-1)(NM— 1)pS* 
=(NM-1)S*[1+(M-1)p] (9.12) 
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using the definition of p in (9.9). Substitute in (9,11) for V(y). This gives 
_1-f NM-1 


VO MNI 


S?7[1+(M-1)p] 
This completes the proof. 


If a simple random sample of nM elements is taken, the formula for V(¥) is the 
same as (9.10) except for the term in brackets. The factor 


1+(M-1)p 


shows by how much the variance is changed by the use of a cluster instead of an 
element as sampling unit. This factor is therefore Kish’s (deff) for clusters of size M 
(section 4.11). If p >0, the cluster is less precise for a given bulk of sample. If 


p <0, as sometimes happens, the cluster is more precise. Theorem 9.2 is a simple 
extension of theorem 8.2, p. 209. 


An alternative expression can be given for p. Let S, denote the variance among 
cluster totals, on a single unit basis. Then 


L i- Y? = (N—-1)MS,? 


Equation 9.12 can be written as 


(N-1)MS; = (NM ~1)S"[1+(M=1)p] 
so that 


_(N=1)MS,?—(NM~1)S? _ S-S? 13) 
(NM ~ 1)(M~1)S? (M—1)S* G 


when terms in 1/N are negligible: 
The value of the within-cluster mean square 


2 NM 
Sw =LY (y7) /N(M-1) (9.14) 
ij : 
is worth noting. By the one-way analysis of variance, 
tals! eu NM 
(NM=1)S°=} (y= Y)?/M+¥,¥ (yj =J) 
i ij 


NM~1 
ws +(M=~1)p]+N(M—1)S,,? 


by (9.12). Hence, 


NM-~-1 


Sw’ = Tyg S(1 =p) = $°(1 =p) (9.15) 
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A good discussion of the numerical values of p for different items and different 
sizes of cluster is given by Hansen, Hurwitz, and Madow (1953), who regard p asa 
“measure of homogeneity” of the cluster. s 


9.5. VARIANCE FUNCTIONS 


In some types of surveys, for example, soil sampling, crop cutting, and surveys 
of farming that utilize an areal sampling unit, the size Of the cluster unit may be 
capable of almost continuous variation. In the search for the best unit the problem 
is not that of choosing between two or three specific sizes that have been tried but 
of finding the optimum value of M regarded as a continuous variable. This 
problem requires a method of predicting the variance S; between units in the 
population as a function of M. By the analysis of variance, Sp% can be found if we 
know (a) the variance S? between all elements-in the population and (b) the 
variance S,, between elements that lie in the same unit. Our approach is to predict 
S,2 and S? and to find Sẹ” by the analysis of variance. 

The sample data produce estimates of S? and S,,° for the size of unit actually 
used, Since S? is the variance among elements, it is not affected by the size of the 
unit, However, S,,” will be affected. It might be expected to increase as the size 
of the large unit increases. If the large units to be examined differ little in size from 
the unit actually used, a first approximation is to regard S, as constant, using the 
estimate given by the sample data. An investigation by McVay (1947) suggests 
that this approximation may often be satisfactory. 

As a better approximation, attempts have been made (Jessen, 1942; 
Mahalanobis, 1944; Hendricks, 1944) to develop a general law to predict how Si 
changes with the size of unit. In several agricultural surveys, S appeared to be 
related to M by the empirical formula 3 

S=AM® (g>0) (9.16) 


where A and g are constants that do not depend on M. In this formula Sẹ 
increases steadily as M increases. Usually g is small. A curve of this type might be 
expected when there are forces that exert a similar influence on elements close 
together. Climate, soil type, topography, and access to, markets tend to give 
neighboring farms similar features. 

Theoretically, the formula is open to objection, since it makes Sy increase 
without bound as M increases. If we assume, as seem reasonable, that there is no 
correlation between elements that are far apart, a formula in which Sw 
approaches an upper bound with large M would be more appropriate. However, a 
formula will suffice if it gives a good fit over the range of M that is under 
investigation. 5 

If this formula fits, log Sw“ should plot as a straight line against log M. Values of 
S,2 for at least two values of M are needed in order to estimate the constants log A 
and g. At least three values of M are necessary for any appraisal of the linearity of 


the fit. 
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From the analysis of variance in Table 9.5 (p. 239) we find 
2 _(NM—1)S*—N(M~-1)S, 
A N-1 
_ (NM~1)S?—N(M—1)AM® 
4 N=1 
= MS*—(M—1)AM® (9.17) 
Hendricks (1944) has pointed out that the complete population might, be 
regarded as a single large sampling unit containing NM elements. If (9.16) holds, 
then S*= A(NM)*. The advantage of this device is that the values of A and g can 


now be estimated from the data for a survey in which only one value of M was 
used. The two equations that lead to the estimates are 


Ss 


log Są’ =log A +g log M (9.18) 
log S? = log A +g log (NM) (9.19) 

The formula for S, becomes [from (9.17)] 
f S? = AM®[MN* -(M—1)] (9.20) 


This method furnishes no check on the correctness of (9.16), which might hold 
well enough for smali values of M but fail for a value as large as NM. 

Formula 9.16 is presented as an example of the methodology rather than as a 
general law. The reader who faces a similar problem should construct and test 
whatever type of formula seems most appropriate to his material. In some cases 


log S,” might be a linear function of M, as Fairfield Smith (1938) suggested for 
yield data. 


9.6 A COST FUNCTION 


In an extensive survey the nature of the 
determining the optimum unit. As an illustratio: 
describe a cost function developed by Jessen (1 
large units are clusters of neighboring farms. 

Two components of field cost are distinguished. The component c,Mn com- 
prises costs that vary directly with the total number of elements (farms). Thus 
contains the cost of the interview and the cost of travel from farm to farm within 
the cluster. 

The second component, cvn, measures the cost of travel between the clusters. 
Tests on a map showed that this cost, for a fixed population, varies approximately 
as the square root of the number of clusters. Total field cost is therefore 


field costs plays a large part in 
n of the role of cost factors, we will 
942) for farm surveys in which the 


C=c,Mn+cVn (9.21) 
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Assuming simple random sampling and ignoring the fpc, the variance of the mean 
per element ¥ is S,’/nM. From (9.17), this equals 
-, _S°=(M—1)AM®"* 
V(y) ons SD (9.22) 

To determine the optimum size of unit, we find M, and incidentally n, to 
minimize V for fixed C. The general solution is complicated, although its 
application in a numerical problem presents no great difficulty. 

By some manipulation we can obtain the equation that gives the optimum M. 
First solve the cost equation (9.21) as a quadratic in Vn. This gives 


Vn 4 ue 
2esMVn (14 con) =4 (0103) 
2 


C2 
The equation to be minimized is 
C+AV=cMn+c:Vn+AV 


Differentiating, and noting that aV/an =—V/n, we obtain the equations 


n cıM+ zcn Finn oe (9.24) 
AaV 
M: Se A Z 
cın aM. (9.25) 
Divide (9.25) by (9.24) to eliminate A. This leads to 
NOV a eee eva 
VaM  ciM+icon 
or 
MOVs ject (9.26) 
VaM—1+c2/2e,MVn 
If we substitute for Vn from (9.23), we obtain, after some simplification, 
Mav. ( 4CM) "2 
——= —= =1 9.27 
V aM Be oe ( ) 


By writing out the left side of this equation in full and changing signs on both 
sides, we find 
AM: '[gM-(g-1)] ( sam an 
$?—(M—1)AM** Car O23) 


This equation gives the optimum M. The left side does not involve any of the cost 
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factors, being dependent only on the shape of the variance function. Both sides 
can be seen to be increasing functions of M, for g >0, M=1, within the region of 
interest. Suppose that the solution has been found for specified values of C, c}, and 
c2, and we wish to examine the effect of an increase in c, on this solution. The left 
side does not depend on cı, but the right side increases as cı increases, However, 
the optimum M is found to decrease because of the term c;M on the right. A 
decrease in c3 produces a similar effect. 
Now c; increases if the length of interview increases, whereas c decreases if 
travel becomes cheaper or if the farms in a given area become denser. These facts 
lead to the conclusion that the optimum size of unit becomes smaller when 


length of interview increases 

travel becomes cheaper 

the elements (farms) become more dense 
total amount of money used (C) increases 


This conclusion is a consequence of the type of cost function and would require 
Teexamination with a different function. It illustrates the fact that the optimum 
unit is not a fixed characteristic of the population, but depends also on the type of 
survey and on the levels of prices and wages. 

Hansen, Hurwitz, and Madow (1953) give an excellent discussion of the 
construction of cost functions for surveys involving cluster sampling. 


9.7 CLUSTER SAMPLING FOR PROPORTIONS 


The same techniques apply to cluster sampling for proportions. Suppose that 
the M elements in any cluster can be classified into two classes and that Pi =a,/M 
is the proportion in class Cin the ith cluster, A simple random sample of n clusters 
is taken, and the average p of the observed p; in the sample is used as the estimate 
of the population proportion P, 

It will be recalled (section 3.12 


) that we cannot use binomial theory to find V( p) 
but must apply the formula for ; 


continuous variates to the Pi; This gives 


N 
Y (=P)? 
N-nj> `. N- 
MO) Na Noa igh PP) (9.29) 


Alternatively, if we take a simple random sample of nM elements, the variance of 
p is obtained by binomial theory (theorem 3.2) as 


_(NM=nM) PQ . N-n PQ 
Voin P) = NMI nM N nM 230 
if N is large, Consequently the factor, the deff, 


Vip) . MY (p-P} 
TOS NO T w large) (9.31) 
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shows the relative change in the variance caused by the use of clisters. Numerical 
values of this factor are helpful in making preliminary estimates of sample size 
with cluster sampling. The required sample size is first estimated by the binomial 
formula and then multiplied by the factor to indicate the size that will be necessary 
with cluster sampling. For an illustration, see Cornfield (1951). 

If the cluster sizes M; are variable, the estimate p =) a;/X, M; is a ratio estimate. 
Its variance is given approximately by the formula (section 3.12) 


N 
2n — p)2 
N-n 2 Mi (p P) 


V= Name oN Tee 


where M=} M,/N is the average size of cluster. n 
If this sample is compared with a simple random sample of nM elements, we 


find, as a generalization of (9.31), 
V(p M; (p: -PP 


Voin (p) NMPQ 
As with continuous variates, the relationship of size of cluster to between- 
cluster variance can be investigated, either by expressing the factor in (9.31) and 
(9.33) as a function of M, or by seeking a relation between the within-cluster 
variance and M. If we assign the value 1 to any unit that falls in class C and 0 to any 
other unit, the fundamental analysis of variance equation for fixed M is 


NMP(1-P)=M} (pi -P+M X p-p) (9.34) 
total ss = ss between clusters +ss within clusters j 


(9.32) 


From this relation the mean square within clusters can be computed and plotted as 
a function of M. McVay (1947) describes how this analysis can be used to 


investigate optimum cluster size. 


EXERCISES 


9.1 For the data in Table 9.1 compare the relative net precisions of the four types of 
unit when the object is to estimate the total number of seedlings in the bed with a standard 
error of 200 seedlings. (Note that the fpc is involved.) 

9.2 For the data in Table 3.5 (p. 67) estimate the relative precision of the household to 
the individual for estimating the sex ratio and the proportion of people who had seen a 
doctor in the past 12 months, assuming simple random sampling. 5 

9.3 A population consisting of 2500 elements is divided into 10 strata, each containing 
50 large units composed of five elements. The analysis of variance of the population for an 
item is as follows, on an element basis. 


df ms 
ee 
Between strata 9 30.6 
Between large units within strata 490 3.0 


Between elements within large units 2000 1.6 
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Ignoring the fpc, is the relative precision of the large to the small unit greater with simple 
random sampling than with stratified random sampling (proportional allocation)? 


9.4 A population containing LNM elements is divided into L strata, each having N 
large units, each of which contains M small units. The following quantities come from the 
analysis of variance of the population, on an element basis. 


S,? = mean square between strata 
S= mean square between large units within strata 
S3°= mean square between elements within strata 
If Nis large and the fpc is ignored, show that the 
unit (element) is improved by stratification if 
(M=1) M_1 
$7 S53? 


9.5 Inaruralsuryey in which the sampling unit is a cluster of M farms, the cost of taking 
a sample of n units is 


relative precision of the large to the small 


C=41Mn+60Vn 


where fis the time in hours spent getting the answers from a single farmer. If $2000 is spent 
on the survey, the values of n for M= 1, 5, 10; £ =4, 2, work out as follows. 


M 
1 5 10 
t= hr 400 131 74 
t=2hr 156 40 21 


Verify two of these value: 


> s to ensure that you understand the use of the formula. 
The variance of the sa 


mple mean (ignoring the fpc) is 


Ss? 
mn | *M- Dp] 


If p =0.1 forall M between 1 and 10, which size 
t=2 hr? How do you explain the difference in results? 

9.6 If $5000 were available for the survey, would you expect thi i i nit 
to decrease or increase (relative to that for $2000)? Gi Sa Dep azentu 


f ease ive reasons. You may, if you wish 
find the optimum size in order to check your argument. tee y 


of unit is most precise for (a)t=}hr, (b) 


CHAPTER 9A 


Single-Stage Cluster Sampling: 
Clusters of Unequal Sizes 


9A.1 CLUSTER UNITS OF UNEQUAL SIZES 


In most applications the cluster units (e.g., counties, cities, city blocks) contain 
different numbers of elements or subunits (areal units, households, persons). This 
chapter deals with some of the numerous methods of sample selection and 
estimation that have been produced for cluster units of unequal sizes. Let M, be 
the number of elements in the ith unit. For the estimation of the population total Y 
of the yj, two methods are already familiar to us. 


Simple random sample of clusters: unbiased estimate 
As before, let 
M, 
Vines X Yj = Mifi 


denote the item total for the ith cluster unit. Given a simple random sample of n of 
the N population units, an unbiased estimate of Y is (by theorem 2.1, corollary) 


Nos 
steal ONS) (9A.1) 
N ini 
By theorem 2.2, its variance is 
x w-¥) 
a _N°U=f) ii 
ViY)= a ai ll (9A.2) 


where Y= Y/N is the population mean per cluster unit. ; 

The estimate Y is often found to be of poor precision. This occurs when the J; 
(means per element) vary little from unit to unit and the M, vary greatly. In this 
event the y;=Miyi also vary greatly from unit to unit and the variance (9A.2) is 
large. 

249 


250 SAMPLING TECHNIQUES 


Simple Random Sample of Clusters: Ratio-to-Size Estimate 
Let $ 
N 
M= > M, = total number of elementsin the population. 
i=1 


If the M; and hence Mo are all known, an alternative is a ratio estimate in which 
M; is taken as the auxiliary variate x; 


È Yi 
Yp= M-—— = Mo (sample mean per element) 
È M, 


i=l 


In the notation of the ratio estimate the population ratio R = Y/X = Y/Mo = Y, 
the population mean per element. By theorem 6.1, assuming that the number of 
clusters in the sample is large, 


N 
204 2 zM Ý? 
V(¥r) = Sier MONA (9A.3) 


Ae 
_NU=f)X Mê- Ý 
RA SNS (9A.4) 
As (9A.4) shows, the variance of Yp depends on the variability among the means 
per element and is often found to be much smaller than V(Y). X 

Note that Yp requires a knowledge of the total Mo of all the M, while Y does 


not. The reyerse is true when we are estimating the population mean per element. 
In this case the corresponding estimates are 


n 


Bee AN YDY 
57 eee =tRL 4M 
Mo Mo È Yo fr Mo 2 m sample mean per element 


Thus, Êr requires knowledge of only the M, that fall in the selected sample. 


9A.2 SAMPLING WITH PROBABILITY PROPORTIONAL TO SIZE 


If all the M, 


are known, another technique, developed by Hansen and Hurwitz 
(1943), is to sel 


lect the units with probabilities Proportional to their sizes M,. One 
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method of selecting a single unit is illustrated in the following small population of 
N=7 units. 


Size Assigned 
Unit M, EM, Range 
1 3 3 1-3 
2 1 4 4 
3 11 15 5-15 ~ 
4 6 21 16-21 
5 4 25 22-25 
6 2 27 26-27 
7 3 30 28-30 


The cumulative sums of the M; are formed. To select a unit, draw a random 
number between 1 and Mo = 30. Suppose that this is 19. In the sum, number 19 
falls in unit 4, which covers the range from numbers 16 to 21, inclusive. With this 
method of drawing, the probability that any unit another is selected is propor- 
tional to its size, 

This method of selecting a unit is convenient when N is only moderate, or in 
stratified sampling when the N, are moderate or small, but the cumulation of the 
M; can be time-consuming with N large (e.g., N = 20,000). For this case, Lahiri 
(1951) has given an alternative method that avoids the cumulation. Let Max 
be the largest of the M, Draw a random number between 1 and N; 
suppose this is i. Now draw another random number m between 1 and Mmax. If m 
is less than or equal to M; the ith unit is selected. If not, try another pair of random 
numbers. Naturally, this method involves the fewest rejections when the M; do 
not differ too much in size. 

Now consider n > 1. Assume at present that sampling is with replacement. To 
select a second unit by the cumulative method, draw a new random number 
between 1 and 30. However, unlike sampling without replacement, we do not 
forbid the selection of unit 4 a second time. With this rule, the probabilities of 
selection remain proportional to the sizes at each draw. An advantage of selection 
with replacement is that the formulas for the true and estimated variances of the 
estimates are simple. 

In sampling without replacement, on the other hand (section 9A.6), keeping the 
selection probabilities proportional to the chosen sizes is more difficult and sooner 
or later becomes impossible as n increases. This may be seen in the extreme 
(although impractical) case n = 7 in the preceding example. If selection were made 
without replacement, every unit would be certain to be chosen, irrespective of the 
original sizes M;, However, for stratified sampling in which the N, are small, much 
research has been done (section 9A.6 ff) to develop practical methods of sampling 
with unequal probabilities. without replacement. 
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9A.3 SELECTION WITH UNEQUAL PROBABILITIES 
` ' WITH REPLACEMENT 


Let the ith unit be selected with probability M;/Mo and with replacement, 
where My =} M;. We will show that an unbiased estimate of the population total 
Y is 


Mat Mapa ke 5 
Y y) 
= Mo (mean of the unit means per element) (9A.5) 


For comparison with subsequent methods, this estimate will be denoted by ane 
Furthermore, 


5 M; (şi > Y)? (9A.6) 


a M, 
VO) 


so that the variance of Ven like that of Y, depends on the variability of the unit 
means per element. 


In some applications the sizes M; are known only approximately. In others the 
“size” is not the number of elements in the unit but a measure of its bigness that is 
thought to be highly correlated with the unit total y,. For instance, the “size” of a 
hospital might be measured by the total number of beds or by the average number 
of occupied beds over some time period. Similarly, various measures of the “size” 
of a restaurant, a bank, or an agricultural district can be devised. Consequently, 
we will consider a measure of size M,’ anda corresponding probability of selection 


N 
2 =M;'/Mo', where Mo' =} M;'. As far as the theoretical results are concerned, 


the z; can be any set of positive numbers that add to 1 over the population. It will 
be shown that 


7 12 y; 
ppz ay p a (9A.7) 
is an unbiased estimate of Y with variance 
V(Pope)== È 2(2—y)" 
(Yr) =r È 21 A, (9A.8) 


The proofs utilize a method introduced in section 2.10. Let t; be the number of 
times that the ith unit appears in a specific sample of size, n, where t; may have any 
of the values 0, 1, 2,..., n, Consider the joint frequency distribution of the t; fof 
all N units in the population. : 

The method of drawing the sample is equivalent to the standard probability 
problem in which n balls are thrown into N boxes, the probability that a ball goes 
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into the ith box being z; at every throw. Consequently the joint distribution of the 
t; is the multinomial expression 
n! ( 
———z"'z2”...zy™ 
Mae epee vs 


For the multinomial, the following properties of the distribution of the 1; are well 
known. 


E(t) = nZ; V(t.) = nz(1 =z), Cov (ft;) = —nz;zj (9A.9) 


Theorem 9A.1. If asample of n units is drawn with probabilities z; and with 
replacement, then 


a 1.2 
Y= 3 7 (9A.10) 
nN i=1 Zi 
is an unbiased estimate of Y with variance 
E 
VY) = y 2(2- ) (9A.11) 
Nis Mi 


Proof. We may write 


1 N 
Were si(a Zrna: i +) =2 2 1 
1 


Z2 Zy 


where the sum extends over all units in the population. In repeated sampling the 
ts are the random variables, whereas the y; and the z; are a set of fixed numbers. 


Hence, since E(t;) = nz; by (9A.9), 


so that fa is unbiased. Also, 


A \2 N 3 
vE ALE 2) vos EE % 2 cove] oa 


Zi i=1j>i Ži Zj 
1. NX (2° N N yy, ] 
==] (Vaaa) 2 X zz 
n 2 Zi zl z) 2 2 Zi Ze (9A.13) 
al N y 2) is (2 N? 
=-( y 4-y?)=— LANG 
n 2 Zi n > gi Zi (9A.11) 


since 5 z; = 1. This completes the proof. 
Taking z; = M;/ Mo in theorem 9A.1 gives the corresponding results for sampl- 


ing with probability proportional to size. 
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An alternative expression for V( Ne) can be given. From (9A.13), 


A N y- z NON 
nV (Pope) = YORE 9 S S yy, (9A.14) 
i=1 i i=1j>i 
Since (1—2z;) equals the sum of all other z’s in the population, the coefficient of 
yi lz; in (9A.14) contains a term zj for any j #i. Similarly, the coefficient of ylz 
contains a term z;. Hence, 


s LN N YEZ Y zi 
Made, 2B (ay) 
eel NN (2-4)? 
n 2, x a Zi Zj QA:15) 


Theorem 9A.2. If a sample of n units is drawn with probability proportional 
to z; with replacement, an unbiased sample estimate of V(Y,,.) is, for anyn>1, 


(Pope) = È (2-9) /min= 0 (9A.16) 


Proof. By the usual algebraic identity, 


5 (i) 


i=1 \2Z; 
Hence, from (9A.16) 


Eln(n~1)0(Ppe)]=E Dy (2 Y) =n V( Yip) (9A.18) 


le 2 A 
2 Y) =n(¥op2 =Y}? (9A.17) 


i=l i 


by definition of Voga): Introducing the variables ti, we get 


7 SERA p 
n(n— NE[e(Y,,:)J=E » (2 Y) ERV o) 


R yi E s 
=n pa a(ž- r) FAV (Yor) 
=n(n~1)V(Y,p2) (9A.19) 


using (9A.11) in theorem 9A.1. This completes the proof. 


Since Y,,,. is the mean of the n values y;/z;, formula (9A.16) has a very simple 
form. 


When selection is strictly proportional to size (i.e., z; = M;/Mo), theorems 9A. 1 ; 
9A.2 are expressed as follows. 


Theorem 9A.3. Ifasample of n units is 
to size, z; = M;/M, and, with replacement, 
Seas Mo x ( 2) M 


n izi 


drawn with probabilities proportional 


32 È (7) =Mo¥ (9A.20) 


i 
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where ¥ is the unweighted mean of the unit means, is an unbiased estimate of Y 
with variance 


A, N H 
W(Yops) = D Mi: - Ý (9A.21) 


These results follow from theorem 9A.1, since ¥; = y;/M; and Y= Y/Mo: 


Theorem 9A.4. Under the conditions of theorem 9A.3, an unbiased sample 
estimate of V(Ypps) is 


v( Zips) = Ma? (Gi-HP*/n(=1) (9.22) 


The result follows by substituting z; =Mj,/Mo in (9A.16), since ¥; = y;/M; and 
Y pp: = Moy. 


9A.4 THE OPTIMUM MEASURE OF SIZE 


In cases in which the measure of size M,’ is some estimate.of the bigness of the 
unit, a question of interest is: what measure of size minimizes the variance of Ypp:? 


Now, 
y yal Sh (m_ eae 2) 
Vifm) Bale on 27 X 
This expression becomes zero if z; °C y;: that is, z; = y;/ Y. If the y; are all positive, 
this set of z; is an acceptable set of probabilities. Consequently, the best measures 
of size are numbers proportional to the item totals y, for the units. 

This result is not of direct practical application; if the y; were known in advance 
for the whole population the sample would be unnecessary. The result suggests 
that if the y; are relatively stable through time, the most recently available 
previous values of the y; may be the best measures of size for this item. In practice, 
of course, a single measure of size must be used for all items in selecting the 
sample. If there is a choice between different measures of size, the measure most 
nearly proportional to the unit totals of the principal items is likely to be best. 


9A.5 RELATIVE ACCURACIES. OF THREE TECHNIQUES 


In this section we compare the accuracies of three preceding techniques for 
estimating the population total with cluster units of unequal sizes (assuming that 
the M; are known if the technique requires them). 


1. Selection: equal probabilities. Estimate; i 
2. Selection: equal probabilities. Estimate; Yr- 
3. Selection: probability © size. Estimate, We 
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There is no simple rule for deciding which is most accurate. The issue depends 
on the relation between y; and M; and on the variance of y; as a function of M;. The 
situation favorable to the ratio and pps estimates is that in which J, is unrelated to 
Mi. The situation favorable to Y, is that in which the unit total y; is unrelated to M,. 

Some guidance can be obtained by expressing the variances of the three 
estimates in a comparable form. We assume that (N-1)=N and write E(y,— 


ý =y (i — FYN. We also assume that the bias of Yz is negligible. 
For Ŷ, we have} from (9A.2), 


nV(¥.)=N-EQi- YY =(1-E(Ny- YF (9.23) 
For Ýr, from (9A.4), 


£ \ 2 
nV(¥a)= N70 — EMP, - P? = (1 -pE( H my,- Y? 


M 
(9A.24) 
where M =F} M;/N = M/N. 
From (9A.21), for Y,,., 
M; 4 
nV( Lops) = NMoEM, (5, ~ ¥)? = Mee) 5, mY) 
= e(%) (Moyi— Y) (9A.25) 


From (9A.23), (9A.24), (9A.25), we see that V( Y,) depends on the accuracy of 
the quantities Ny; = NM,j; as estimates of Y, while V(r) and vV( Lops) depend on 
the accuracy of the quantities Moy; = My,/M, as estimates of Y. If ři is unrelated 
to M;, we expect the Mj; to be more accurate than the NM, and the reverse if yi 
is unrelated to M,. 

As regards Yz and Yar note from (9A.24) and (9A.25) that V( Yr) gives 
relatively greater weight to large units than V(Y,ps). Note also that Y, and Yr 
benefit from the fpc term, which can become substantial in small Strata (e.g., with 


opment of unequal probability 


selection without replacement. Formula (9A.24) for Yr, of course, holds only in 


large samples. 
Further comparisons among the methods have been made from an infinite 
Population model by Cochran, 2nd ed., Des Raj (1954, 1958), Yates (1960), 
Zarcovic (1960), and Foreman and Brewer (1971). Most writers assume that the 
finite population is a random sample from an infinite Superpopulation, in which 


y=a+BMte;  E(e|M)=0 ~ (9A.26) 


which is hoped to approximate the relation that 


A holds in many surveys. Some 
assumption must also be made about the variance 


of e; in clusters of given size. 
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From (9.12) in Section 9.4, we get on dividing by (N— 1) = N, 
V(e,) = V(y:|M;) = M;S*[Mp + (1—p)] 
As an approximation this suggests V(e;) =cM# where 1<g <2 in most applica- 
tions. From the model (9A.26) we get 


Y=a+BM+éey: -geta (A.27) 


We assume here that ëy is negligible; this amounts to ignoring the fpc. 
It follows that, 


nV Od) Eo,- P= EIM -Ñ +e} 
= B°V(M,)+cE(M*) (9A.28) 
aoe = EM? (J; — Y= Ely, -M Ý? 
2 
= Ela(1-M/M) +e)" “ae. cE(M§) (9A.29) 


nl emote) 


fies MP) oem 3) = CV, ente (ME 1) (9A.30) 


Hence, approximately comparable variances under this model are 


T= g? V(M) + EM")  @A31) 
UE = a 2 ae VM) ceme) (9A.32) 
2 
NVs 2 AOA van, cME(M#“") (9A.33) 
N M 
Consider first œ = 0, the case in which J; is unrelated to M;. Clearly, 
Ve< Vz 


If B (which in this case becomes Y) is large, the superiority of the ratio estimate 
may be great. With a =0, 


Vs <Va for g=l 


This holds because the covariance of M; and M8‘ is positive if g>1 and zero if 


258 SAMPLING TECHNIQUES 
g=1, so that with g=1, 
E(Mf)= E[(M,)(M;*"')]= ME(M;~") 

If 8 = 0 and a #0, so that the unit total y; is unrelated to M, Y,, always beats Yg 
and beats Vr except possibly in the unlikely case &=2 with B =0. pal 

When neither a nor £ vanishes, the relative performances of Yu and Yr, Ypps 
depend on the relative sizes of || and |B|. For instance, Yg heats Me ae 
B° >a?/M’, as noted by Foreman and Brewer (1971). 

As regards Vp versus V pps. the coefficients of a? are approximately the same in 
Vp and V,,,,. From the terms inc, Vip < Vpr in the case & > 1 expected to apply to 
most applications that have 8 #0, while Vim > Vr if g<1. 

The results from this model agree with the conclusions Suggested earlier in this 
section. If ï; shows no trend or only a moderate trend as M, increases, the ratio 
method with equal probabilities and the pps method are 
unbiased estimation with equal probabilities and may 
superior if the unit total yi is unrelated to M,. There i 
ratio and the pps estimate 
estimate is usually more pri 

The estimates Ý, and Yr are helped by the fpc term when this is appreciable, 
To anticipate later work (section 9A.12), the evidence suggests that the best pps 


equal-probability sampling. 


94.6 SAMPLING WITH UNEQUAL PROBABIL 


ITIES WITHOUT 
REPLACEMENT 


Much of this work was produced for extensive Surveys in which the cluster units 
had first been stratified by some other principle (e.g., geographic location) into a 
Substantial number of relatively small’strata, only a small number of cluster units 
being drawn from each stratum. The case n, =2, which provides one degree of 


ing sampling errors, is of particular 


remaining units is selected with assigned probabiliti 
probability 7; that the ith unit will be selected ateith 


is 
S zz z 
M I ee 15-4) (94.34) 
jai (l Zi) jzxil=z 
Zi 
=2(1+a- 2.) (9A.35) 


where A=) 2,/(1— z;) taken over all N units, 
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Suppose that 7; = 2z; the relative probabilities of selection of the units remain- 
ing proportional to the measure of size z;. The simple estimator introduced by 
Horvitz and Thompson (1952), 3 


Pm 2% 2; 
will then have zero variance if the z; are proportional to the y; since z; = y;/ Y and 
every selected unit will give the correct estimate yil zı = Y. However, in (9A.35), 
the quantities z;' = 7/2 are always closer to equality than the original z; because 
of the second factor in (9A.35). In the example given by Yates and Grundy (1953), 
with N=4, n=2, z,;=0.1, 0.2, 0.3, and 0.4, the z;' are found to be 0.1173, 
0.2206, 0.3042, and 0.3579. 

Three approaches have been used to cope with this distortion of the intended 
probabilities of selection. One is to retain the preceding natural method of sample 
selection, but make appropriate changes in the method of estimating the popula- 
tion total. One is to use a different method of sample selection that keeps m; = nz; 
subject to certain restrictions on the values of the nzj. One is to accept sorne 
distortion of the probabilities if this has compensating advantages (e.g. in simplic- 
ity or generality). Examples of the three methods will be given in section 9A.8 and 
following sections. 

First, we present the best-known general estimate of the population total for 
unequal probability sampling without replacement. 


9A.7 THE HORVITZ-THOMPSON ESTIMATOR 


A sample of n units is selected, without replacement, by some method. Let 
m; = probability that the ith unit is in the sample 
m; = probability that the ith and jth units are both in the sample 


The following relations hold: 


N N N 
Mmm Pk aa YY my =3n(n—- 1) (9A.36) 
i i j>i 
To establish the second relation, let P(s) denote the probability of a sample 
consisting of n specified units. Then 77, = =P(s) over all samples containing the ith 
and jth units, and 7; ==P(s) over all samples containing the ith unit. When we 
take 27, for j #i, every P(s) for a sample containing the ith unit is counted (n — 1) 
times in the sum, since there are (7 — 1) other values of j in the sample. This proves 
the second relation. The third relation follows from the second. 
The Horvitz-Thompson (1952) estimator of the population total is 


Yer = b> (9A.37) 


where y, is the measurement for the ith unit. 
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Theorem 9A.5. If m; >0, (i=1, 2,+++,.N) 


Fae ve 
y Yar=} Ti 
is an unbiased estimator of Y, with variance 
SR Gl OT ee 
VE) = $ GoM j g g e ONE 
i=1 Ti iSi j>i 0 HT; 


where 77; is the probability that units i and j both are in the sample. 
Proof. Lett; (i=1,2,... , N) be a random variable that takes the value 1 if the 


ith unit is drawn and zero otherwise. Then t; follows the binomial distribution fora 
sample of size 1, with Probability 7;. Thus 


E(t) =m, V(t) =7(1=7,) (9A.39) 
The value of Coy (tt; 


)is also required. Since tit; is 1 only if both units appear in the 
sample, 


Coy (44) = Eltit;)—E(t,)E(4) = Ty > TT. o> {9A.40) 
Hence, regarding the y; as fixed and the f; as random variables, 


EV) =E( EB) Fy ay 


i=l T; i=1 


X N \2 NN y, 
Vfim =3 (2) va)s23 £ HW coy i) 
A > 


i j>i T} 
N(1— A N N i= 
“LSD aF y umm) y, (9A.41) 
Tj i j>i TT; 


This proves the theorem. 


This variance may be expressed in anot 


her form by using the first two of the 
relations in (9A.36). These give 


y T T E ! 
jři 
Substituting for (1—7) in the first term in (9A.41), 
| 
\ (1 —27;) NN i 2 NN 2 ed 
=) Y (nay ~ mp (2) =Y (vim ~m| (2) +(4) | 
i i Ti i jai Ti i j>i Ti 


i Tj; 
Hence, 


V(Frer) SEL Crm] d-a 4] 


(9A.42) 


i j>i 


= 5 (m= m (2-2) 
i J 
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Corollary. From (9A.41), using the í method, an unbiased sample estimator 
of V(¥;7) is seen to be 


ô z =m) 2 7;77;) 
Ý, = +2 (ry mm) 2 
vi( Yur) soy = zy ae iY; (9A.43) 
provided that none of the 7 in the population vanishes. 

A different sample estimator has been given by Yates and Grundy (1953) and 
by Sen (1953). From (9A.42), this estimator is 


A nin (ara m) (VN 
OAE Ga m(n) (9A.44) 
i j>i Tij Mi Tj 


with the same restriction on the mj. 

Since the terms (7:7; — mj) often vary widely, being sometimes negative, v, and 
v2 tend to be unstable quantities. Both estimators can assume negative values for 
some sample selection methods. Rao and Singh (1973) compared the coeffic.cnts 
of variation of v, and v2 in samples of n =2 from 34 small natural populations 
found in books and papers on sample surveys, using Brewer’s sample selection 
method (section 9A.8), for which m; =2z; as desired. The estimator v, was 
considerably more stable as well as being always =0 for this method, while v, 
frequently took negative values. 

We turn now to methods of sample selection. The bibliography by Brewer and 
Hanif (1969) mentions over 30 methods, of which 4 methods will be described. 
Most methods become steadily more complex as n increases beyond2, a few 
extend fairly easily. The presentation will give most attention first to n = 2, both 
for simplicity and because this is a common situation in nationwide samples using 
many small strata with n, = 2. 


9A.8 BREWER’S METHOD 


For n = 2 this method of sample selection keeps 7; = 2z; and uses the Horvitz- 
Thompson estimator 


Fier = 24 aT (2 4H) 
mo m (Wz Zj 


‘Using different approaches, methods produced by Brewer (1963), Rao (1965), 


and Durbin (1967) all gave the same 7r; and 77 values. We assume every z; < 1/2. 

Brewer draws the first unit with what he calls revised probabilities proportional 
to z;(1—z;)/(1—2z;), and the second unit with probabilities z;/(1— z;), where j is 
the unit drawn first. The divisor needed to convert the z;(1—2z,)/(1—2z;) into 
actual probabilities is their sum 


yA AE 14s Zi ) 


) 
E DAEA ANNEE GAs 
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With n=2 the probability that the ith unit is drawn is the sum of the 
probabilities that it was drawn first and drawn second. Thus 


~ ZO FZ) NEN Z;(t=z)) 42; 
™*D(1=2z)) Di (1—27) (=z) 


ZA 2 za RZE NAN | an RZ ) 
Lä ei E 4 Jo, oA.46 
BE [2g aoe | DN! pier DAO) 


noting (9A.45) for D. Similarly, 
1 1 


my = 28 zi ) = 224 (sz; =2)) 
” D\1-2z, 1-22; D -(1=2z;)(1=2z;) 


Since this method uses the HT estimate, theorem 9A.5 and its corollary provide 
formulas for the variance and estimated variance of Y: 

The method has two desirable properties. Brewer (1963) has shown that its 
variance is always less than that of the estimate Ye in sampling with replacement. 
Second, some algebra shows (Rao, 1965) that (727, — my) > 0 for all i # j, so that 
the Yates-Grundy estimate v> of the variance is always positive. 

Durbin’s (1967) approach draws the first unit (i) with probability z,. If unit i was 
drawn first, the probability that unit j is drawn second is made proportional to 


1 l ] 
) Tae 1.48) 


In this case the divisor of the Proportions is 


(9A.47) 


x 1 1 ] EZA AM2 N 
+ -|= i L = = 
PE (22) t=22, Az A 


izi 
(9A.49) 


(94.45). The probability that the ith 
is, therefore, 


Thus the divisor is equal to 2D in Brewer's 

unit was drawn first and the jth unit second 

P(i)P(jli) =24{—1 | 50 

2DL(1=22) 0-22) GRAS) 

By symmetry, this equals P(j)P(i 
(9A.47), 

Sampford (1967) has extended this method to samples of size n, provided 

nz; <1 for all units in the Population. With his method of sample selection, the 


Probability that the sample consists, for example, of units 1,2,...,n is a natural 
extension of (9A.47), being proportional to 


(ae 2) tl af fl (nz) (9A.51) 


|j), so that Durbin’s Ty is the same as Brewer's in 
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For this method it can be shown that 7; = nzj. A formula for m; is given, with 
advice on its calculation by computer. The HT estimator of Y is used, so that 
formulas are available for its variance and estimated variance. The Yates-Grundy 
estimator v2 in (9A.44) is always positive. Several methods of actually drawing the 
sample so as to satisfy (94.51) are suggested by Sampford. One is to draw the first 
unit with probabilities z; and all subsequent units with probabilities proportional 
to z,/(1— nz;) with replacement. If a sample with n distinct units is obtained, this is 
accepted. An attempt at a sample is rejected as soon as a unit appears twice. This 
method can be seen to lead to (9A.51). Asa guide to its speed, a formula is given 
for the expected number of attempts required to obtain a sample. 

For n =2 Durbin’s (1967) method of drawing the sample, unlike the Brewer 
method, has the property that the unconditional probability of drawing unit / is z; 
at both the first and the second draws. In multistage sampling in surveys repeated 
at regular intervals, Fellegi (1963) pointed out earlier that it is necessary or 
advisable to drop units and replace them from time to time on some regular 
pattern called a rotation scheme, because of the undesirability of Jong-continued 
questioning of the same persons. He produced a method of selection of successive 
units that also has the Durbin property. His method, based on iterative calcula- 
tions, is similar to the Brewer-Durbin method, but has slightly different 77. 


94.9 MURTHY’S METHOD 


This method uses the first selection technique suggested (section 9A.6), the 
successive units being drawn with probabilities z, z;/(1—- zi), z,/(1— z- z), and 
so on. Murthy’s estimator (1957) follows earlier work by Des Raj (19562), 
who produced ingenious unbiased estimates based on the specific order in which 
the n units in the sample were drawn. Murthy showed that corresponding to any 
ordered estimate of this class we can construct an unordered estimate that is also 
unbiased and has smaller variance. 

His proposed estimator is 


È Peoli): 
P(s) 


Yu = (9A.52) 


where 


P(s|i)= conditional probability of getting the set of units that was drawn, given 
that the ith unit was drawn first 


P(s)= unconditional probability of getting the set of units that was drawn 


We now prove that the esumate w is unbiased. For any unit i in the 
population, Ł P(s|i) = 1, taken over all samples having unit i drawn first. To show 
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units may be selected more than once in the sample, but the average, frequency of 
sesim is m; = nz;. This happens in the example for unit 3, since 3M3 =33> 
30) = M,'. Unit 3 is chosen once if 1=r=12 or 16<r<30 (in all 27 choices), and 


twice if 13 <r £ 15 (3 choices). The average frequency of selection is (1 x27+2 
3)/30 = 33/30 = 323. 
It follows that 


(9A.61) 
is an unbiased estimate of Y. 


Hartley and Rao (1962) examined this method with the units 
random order, With the restriction nz 


expressions for V( Ven) and v(Y,): 


first arranged in 
i <1, (all i), they obtained approximate 


9A.11 THE RAO, HARTLEY, COCHRAN METHOD 


For a sample of size n, this method first forms n random groups of units, one unit 
to be drawn from each group. The numbers of units Ni, N2,...,N, in the 
respective groups may be chosen in advance: we will sce that itis advantageous to 
make the N, as equal as possible, If Z, is the total measure of size for group g, the 


ith unit in the group is given probability of selection Zi/ Zy, The estimate of the 
population total, Rao, Hartley, and Cochran (1962), is 


4 Lh ea Sna 

Yauc= 2 Z,—= ¥ Yy (9A.62) 
Ral | Zee Rey 

where y,, 2, refer to the unit drawn from 

Since the 


selection pr 


group g. 
Z; will not be equal, this method does not keep the probabilities of 
portional to the sizes, and there is some evidence that its estimator 
suffers a slight loss in precision. Its advantages are its simplicity and generality, 
In developing V(Yruc) we average over two stages. Stage 1 is the randomiza- 
tion into groups, Stage 2 the selection of a unit within each group. For any specific 
split into groups, Y, in (9A.62) is an unbiased estimator of the group total Y,, and 
hence E>(Yguc)= Y. A well-known formula for finding a variance over two 
Stages of sampling, proved in Chapter 10, is 


VYRne) = E[V2( Yano + V\[E2( Yenc] 

Since E,( Vive) = Y and is a constan 

(10.2) disappears in this application. 
For V2( Ypuc) we can use, within a group, 

with replacement, since only one unit is selecte 


the probabilities of selection are Zil Zg By (9A 
specific split (with n,=1), 


(10.2) 
t, it has zero variance and the second term in 


the variance formula for sampling 


d from each group, Within a group 
-15) in section 9A.3 we get, for any 


(9A.63) 
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taken over the pairs of units in group g. Over the set of random subdivisions into 
groups, the probability that any pair of units falls in group g is N,(N, —1)/ 
N(N-—1). Hence the average value of V2(¥Y,) over the randomization is 


ON ANG-I YN (yey)? Ng (N= D(X yer 
= he po PAPAA =—8— 2 2i y2 
EVENT È È aala 2) MOS 2- y)  0A69 
from (9A.63). Since Tora =}, Ye 

N n N = 

(È nena a (Enn) 
El V.Y, Se e a ) (9A.65) 

Hedaya RG: N(N-1) \¥ zi N(N-1) PPE : 


Thus V(Yeuc) issimply a multiple of the variance of the estimate Yar in sampling 
with replacement. If N/n is integral, the choice N, = N/n minimizes the multi- 
plier. In this case 
a -l A 
Vane) = (1- 2=t) VP) (0.66) 

If N=nR +k, with R integral and k <n, the best choice is to make k groups of 
size (R +1), the remaining (n — k) of size R. This gives 
n-1 k(n- x] A 
— +——_| V Yp 9A.67 
NINN D Ay) 

If N/n is integral, (9A.66) gives Vi Yeuc)/ VC Yea) =(N-—n)/(N-1), the same 
ratio as obtained in simple random sampling. 

An unbiased variance estimator can be shown to be 


VORRE) [ 1- 


ss 2 
v(Yruc) = at 2 z,(28- fa) (9A.68) 


With N=nR +k, and k groups of size (R + 1), formula (9A.68) becomes 
N?+k(n—k)—Nn 
N?(n—1)—k(n—k) 


a n y, Ft 2 
v(Yruc)= dy z2- frnd) (9A.69) 
R 


25 


9A.12 NUMERICAL COMPARISONS 


In the literature, comparisons of the performances of some of the methods have 
been made, particularly by Rao and Bayless (1969, 1970), in three situations: (c) 
on small artificial populations [e.g., the populations with N = 4, n = 2 constructed 
by Yates and Grundy (1953)], (b) under the linear regression model used in 
section 9A.5, and (c) on 20 natural populations. 
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Seven methods are compared here on three artificial populations with N = 55 
n=2. The relative sizes z; of the units are the same in all three populations 
(A, B, C) (Table 9A.1)/In A’ the mean per element, which is proportional to y,/z;, 
is uncorrelated with z;. In B the mean per element rises as the sizes increase. In C 
the unit totals have little relation to the sizes, 


TABLE 9A.1 
THREE SMALL ARTIFICIAL POPULATIONS 


Relative sizes (z;) 0.1 0.1 0.2 0.3 0.3 
Yi 0.3 0.5 0.8 0.9 1:5 


Population A 


Yz 3 5 4 3 5 
. 0.3 0.3 0.8 1.5 1.5 
lat A ; 
Population B Ses 3 3 4 5 5 
RE Yı 0.7 0.6 0.4 0.9 0.6 


Yil z; 7 6 2 3 2 


4. Brewer’s method, sampling without replacement. Estimate: Y,= 


(1/n) ¥ (y,/z)). 


5. Murthy’s method. Estimate: Yu = [a ~z)~4(1 ~2) 4] /(2-2, —z) 
Zi Z ‘ i 
6. The RHC method with one group of three units, one of E units, Estimate 
RHC =) Z,(Y_/ Zg) 
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measures of size. The MSE’s of Yp should be fairly close to the variances of the 
unequal-probability methods. 

In a choice of a method in practice, major considerations are the ease with 
which the sample can be drawn, the simplicity of the estimate, the accuracy of the 
estimate, and the availability of an estimator of the variance of the estimate, With 
n=2, all methods are simple. As regards accuracy, the. systematic method is 
shown here only in its most favorable case, which the sampler rarely has the 
knowledge to use. All methods except Y,,, and Yp provide unbiased sample 
estimates of error variance. Table 9A.2 presents the variances (with the MSE 
for Yp). 


TABLE 9A.2 
VARIANCES OF THE ESTIMATED POPULATION 
TOTALS 
Population 

Estimate A B C 
A 1.575 2.715 0.248 
Yr (MSE) 0.344 0.351 1.421 
Y ppz 0.400 0.320 1.480 
Yp 0.246 0.248 1.251 
Yu 0.267 0,237 1.130 
Yruc 0,320 0.256 1.184 
ve 0.150 0.140 0.760 


In equal-probability selection, the ratio of V(¥) in sampling without replace- 
ment to V(¥) in sampling with replacement is (N-n)/(N~1)=0.75 for 
these populations. For Yryo the ratio V(Ypuc)/V(Yppz), where Yppz denotes 
sampling with replacement is 0.8 in Table 9A.2. For the other unequal- 
probability methods, the ratio varies from population to population, The average 
of the three ratios in A, B, and C is 0.74 for Brewer’s method and 0.73 for 
Murthy’s method, about the same ratio as for equal-probability methods. Results 
for n=3, 4 on small natural populations by Bayless and Rao (1970) agree in 
general with (N= n)/(N— 1). it 

In populations A and B, all other methods are much more accurate than Ysrs 
with simple random sampling. The ratio-to-size estimate with SRS performs 
roughly similarly to the unequal probability methods, although not quite as well. 
In the latter, there is little to choose among the Brewer, Murthy, or RHC 
methods. The systematic method at its best performs very well. 

In population C, in which the unit totals bear little relation to the sizes, simple 
random sampling with equal probabilities is much the best. As noted at the end of 
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section 9A.13, this superiority is probably due to the estimator Ny, not the SRS 
feature. 

Rao and Bayless (1969) compared 10 unequal-probability methods in 20 
natural populations found in books and papers on sampling, with N ranging from 
9 to 35. They confined themselves to methods (a) known to have smaller variances 
than ae and (b) providing a positive unbiased variance estimator. Among the 
methods presented here, they compared the efficiencies of Yw, Yruc, and Yppz 
with that of Ŷp. For n =2, there was little to choose among the three “without 
replacement” methods, with Y,, slightly ahead, beating ruc whenever the two 
methods differed in precision. Also, we have noted (section 9A.7) that the 
variance estimator may be unstable for methods using the Horvitz-Thompson 
estimate. The Rao—Bayless results compare the coefficients of variation of v(Ŷm), 
v(Ypuc), and v(Ŷp) in the Yates-Grundy form with that of v(Yppz), as measures 
of the stabilities of the variance estimators. Relative to v(Ŷp) as 100%, the 
median efficiencies of the other variance estimators were: v(Yeuc)= 109%, 
v( Ym) = 104%, v(Yppz)=97%, the three methods all showing a few large indi- 
vidual gains. 

Bayless and Rao (1970) give similar comparisons for n = 3 (14 populations) and 
n = 4 (10 populations). Sampford’s extension of the Brewer method was used. For 
n much beyond 2, both Sampford’s and Murthy’s methods require computer aid 
in calculating the needed probabilities. The variances of Yur and Y, agreed closely 
in nearly all populations, with Ygyc slightly behind, its median efficiency relative 
to Ys dropping to 92% for n =4. In stability of the variance estimators, on 
the other hand, the superiority of Ýryc and Ý, increased, the median relative 
efficiencies being 118% (n = 3) and 129% (n =4) for v( Ŷruc) and 110% (n = 3) 
and 120% (n =4) for v(Ým) relative to 100% for v(¥5). 

Rao and Bayless also compared the efficiencies of Y and v(¥) for some of the 
estimators under the linear regression model of section 9A.5 with a = 0. While 


comparative results depended on the power g, the general trend was similar to that 
in the natural populations. 


9A.13 STRATIFIED AND RATIO ESTIMATES 


The preceding formulas have been Presented as they would apply to'a single 
stratum, although the concentration on small n implied previous stratification by 
another principle (e.g., geographic location, urban-rural). The extension of the 


formulas to this stratification is as usual. For any method, if mj, Zhi This Yair and SO 
forth, refer to stratum h, \ 


He SAR Bet» i 4 
PXP: V= WP,): (=F v( Fj) (9A.70) 


Rao estimates enter either when the variable of interest is a ratio (e.g»» 
unemployed females/females eligible for work), or when they are used to increase 
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precision. In unequal-probability sampling a single choice of the z; or z}; must be 
made for all variables for which estimates are required. For instance, the z; or zp; 
may be proportional to the total recent sales of a type of business, where the 
survey has to estimate current sales of individual classes of items as well as current 
total sales. For some classes, sales may not be closely Proportional to the z;. In 
such cases, use of the familiar “ratio to the same variable last time” estimate may 
bring substantial incréases in precision. 

With unequal-probability sampling the change in formulas to those for ratio 
estimators is easily made. In an unstratified population, replace Y for any method 
by X¥/X. For instance, with the HT estimator, we use X()) y;/7;)/X (x;/m;) for 
the ratio form instead of ¥ (y;/7;). For the standard approximations to the MSE 
and estimated MSE of a ratio estimate, replace y; by d; = (y; — Rx,)in V(Y) and by 
di' = (y; = Rx,) in v(Y). 

For instance, with the ratio form of the Horvitz-Thompson estimator, the 
approximation to v3, the estimated variance in the Yates-Grundy form, is from 
(9A.44), p. 261, 


F N N (mmm) (d;' d'2 
vl Yaren] =D E Gam (4d) (9A.71) 
i j>i My mt Ty 


In a stratified population it is likely with small n, that the combined ratio 
estimate (section 6.11) will be used [i.e., Y= X(D Ŷ,)/(£ X,)]. For approximate 
variance formulas, replace yp; by dni = Yn; —RXpi in V and by dp;' = Yni — RXp; in v. 

When the y; for an item are not related to the z; and no suitable ratio estimate is 
feasible for such an item, Rao (1966) has investigated alternative estimators. 
These are produced from any of the unequal-probability estimators (HT, M, 
RHC, etc.) by replacing y;/z; by Ny;, wherever y; appears in the estimator, Thus, 
for the Horvitz~Thompson estimator, the alternative form is 


a Ne 
Vira 2 yi (9A.72) 
while for the RHC method 
Yituc=N È Zeya (9A.73) 


The estimators are biased but intuition suggests that if the y; 
the z; the biases should be relatively small. By the same metho 
V(Ynr) in (9A.42), we get 


have no relation to 
d as used in finding 


A N?N N 
ilarak, (mimi — myy: =y) (9A.74) 
Thus VEÝž r] depends on the amount 


of variation in the unit t ti i 
V Ësrs). In population C, Table 9A.2 t mosh ee 


his method gave 0.266 and 0.280 for the 
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MSE’s of the alternative forms of Ŷ;r and Yy, these forms doing almost as well as 
Ysps- For both methods the (Bias)? term was about 4% of the MSE. 


EXERCISES 


9A.1, Horvitz and Thompson (1952) give the following data for eye estimates M, of the 
numbers of households and for the accual numbers y, in 20 city blocks in Ames, lowa. To 
assist in the calculations, values of f; and J/M, are also given. A sample of n= 1 block is 
chosen. Compute the variances of the total number of households Y, as obtained by (a) the 
unbiased estimate in sampling with equal probabilities, (b) the ratio estimate in sampling 
with equal probabilities, (c) sampling with probability proportional to M,. (For the ratio 
estimate, compute the true mean square error, not the approximate formula.) 


M; Yi fi y?lM; | M; Yi vi yPlM; 


9 9 1.0000 9,000} 19 19 1.0000 19.000 

9 13 1.4444 18.778 | 21 25 1.1905 29.762 
12 12 1.0000 12.000 | 23 27 1.1739 31.696 
12 12 1.0000 12.000 | 24 21 0.8750 18.375 
12 14 1.1667 16.333 | 24 35 1.4583, 51.042 
14 17 1.2143 20.643 25 22 0.8800 19.360 
14 15 1.0714 16.071 26 25 0.9615 24.038 
17 20. 1.1765 23.529 | 27 27 1.0000 27.000 
18 19 1.0556 20.056 | 30 41 1:5667 73:633 
18 18 1.0000 18.000 | 40 37 0.9250` 34.225 

a 


Do the results agree with the discussion ın section 9A.5? 


9A.2 A questionnaire is to be sent to a sample of high schools to find out which schools 
provide certain facilities, for example, a course in Russian or a swimming pool. If M, is the 
number of students in the ith school, the quantity to be estimated for any given facility is the 
proportion P of high-school students who are in schools having the facility, that is, 


where Ñ is a sum over those schools with the facility. 


A sample of n schools is drawn with probability proportional to M, with replacement. 
For one facility, a schools out of n are found to Possess it. (a) Show that P=a/n is an 
unbiased estimate of P and that its true variance is P(1—P)/n. (Hint. In the corollary to 
theorem 9.4 let y, = M, if the school has the facility and 0 otherwise.) (b) Show that an 
unbiased estimate of V(P) is v(Ê) = P(1— P)/(n ~1), 

9A.3 The large units in a population arrange themselves into a finite number of size 
classes: all units in class h contain M, small units. (a) Under what conditions does sampling 
with pps give, on the average, the same distribution of the size classes in the sample aS 
Stratification by size of unit, with optimum allocation for fixed sample size? (b) If the 
variance among large units in class h is kM,, where k is a constant for all classes, what 
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system of probabilities of selection of the units gives a sample in which the sizes have 
approximately the same distribution as a stratified random sample with optimum allocation 
for fixed sample size? 

9A.4 For a population with N=3, z,=3,4,4 and y,=7, 5,2, two units are drawn 
without replacement, the first with probability proportional to z,, the second with probabil- 
ity proportional to the remaining sizes. (a) Verify that m, =%}. 72=#, 7, =25 and that 
12 = bo» Tia = 40, T23 = g0- (b) For this method of sample selection, compare the variances of 
Yir and Yy and also compare them with the variance of Ygyc using its method of sample 
selection. You may either construct all three possible estimates or use the variance 
formulas. (c) Show that the ratio of V(Y,,) to V(Y,,.) in sampling with replacement is close 
to the value } that applies for equal probability sampling. 

9A.5 For the population in exercise 9A.4, a second variable had values y», = 8, 5, 9, not 
at all closely related to the z,, so that with the sampling method used in exercise 9A.4, Y,rr 
and Y would be expected to perform poorly for this variable. For Rao’s estimator 
Yirr= 1.5(y2+y2;), compare its MSE with the variance of Yrs = 1.5(y2+y2;) 
in equal-probability sampling. How much does bias contribute to the MSE? 

9A.6 For Brewer's method with n = 2, section 9A.8 showed that 

> _ = N 
malana (1452) 
' (1-2z,)(1 - 2z,) REZA 


(a) Show that if every z; <$, 
0< my <42 (every i =j) 


(b) Show that this result makes the Yates-Grundy estimator of variance always positive 
for this method. 
Hint. For (a) it is sufficient to show that 


Zi z 
(1-z,-z,) =(1-2z,)(1 =22)1 + EA acie] 

9A.7 (a) For Durbin’s method with n = 2, verify directly that the probability that the 
jth unit is drawn second is z; as stated on p. 263. 

(b) With N=4, z,=0.1, 0.2, 0.3, 0.4, calculate the probability that with Brewer’s 
method unit 1 is drawn first and the probability that it is drawn second. Verify that the two 
probabilities add to 0.2 = 2z,. 

9A.8 In Madow’s systematic method, a unit may be chosen more than once in the 
sample if nz, > 1 (i.e., nM,’ > Mg’). Show (as stated on p. 266) that for such units the average 
frequency of selection is nz, so that the Horvitz-Thompson estimator of Y remains 
unbiased for nz; >1. x 
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Subsampling with Units 
of Equal Size 


10.1 TWO-STAGE SAMPLING 


Suppose that each unit in the population can be divided into a number of 
smaller units, or subunits. A sample of n units has been selected. If subunits within 
a selected unit give similar results, it seems uneconomical to measure them all. A 
common practice is to select and measure a sample of the subunits in any chosen 
unit. This technique is called subsampling, since the unit is not measured com- 
pletely but is itself sampled. Another name, due to Mahalanobis, is two-stage 
sampling, because the sample is taken in two steps. The first is to select a sample of 
units, often called the primary units, and the second is to select a sample of 
second-stage units or subunits from each chosen primary unit. 

Subsampling has a great variety of applications, which go far beyond the 
immediate scope of sample surveys. Whenever any process involves chemical, 
physical, or biological tests that can be performed on a small amount of 
material, it is likely to be drawn as a subsample from a larger amount that is itself 
a sample. 

In this chapter we consider the simplest case in which every unit contains the 
same number M of subunits, of which m are chosen when any unit is subsampled. 
A schematic representation of a two-stage sample, in which M =9 and m == 2, iS 
shown in Fig. 10.1. 

The principal advantage of two-stage sampling is that it is more flexible than 
one-stage sampling. It reduces to one-stage sampling when m = M but, unless this 
is the best choice for m, we have the opportunity of taking some smaller value that 
appears more efficient. As usual, the issue reduces to a balance between statistical 
precision and cost. When subunits in the same unit agree very closely, considera- 
tions of precision suggest a small value of m. On 
almost as cheap to measure the whole of a unit 
when the unit is a household and a sin 
all members of the household. 


the other hand, it is sometimes 
as to subsample it, for example, 
gle respondent can give accurate data about 
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N 
N 
e 


denotes an element in the sample 


Fig.10.1 Schematic representation of twc-stage sampling (N = 81, n =5, M =9, m =2). 


10.2 FINDING MEANS AND VARIANCES IN 
TWO-STAGE SAMPLING 


In two-stage sampling the sampling plan gives first a method for selecting n 
units. Then, for each selected unit, a method is given for selecting the specified 
number of subunits from it. In finding the mean and variance of an estimate, 
averages must be taken over all samples that can be generated by this two-stage 
process. One way of calculating this average is first to average the estimate over all 
second-stage selections that can be drawn from a fixed set of n units that the plan 
selects. Then we average over all possible selections of n units by the plan. For an 
estimate 6, this method can be expressed as 


E(6)= E,[E,(6)] (10.1) 


where E denotes expected or average value over all samples, E, denotes 
averaging over all possible second-stage selections from a fixed set of units, and E i 
denotes averaging over all first-stage selections. 
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For V(6) this method gives the following easily remembered result. 
V(6) = ViLE2(8)]+ EL V2(4)] (10.2) 


where V;(6) is the variance over all possible subsample selections for a given set, of 
units. To show this, let @ = E(6) (where 6 is not necessarily the quantity that 8 is 
designed to estimate, since @ may be biased). By definition, 


V(6) = E(6- 0} = E,E,(6- 6)" (10.3) 
But 
E,(6 — 0) = E,(6*) —26E,(6) + 0° (10.4) 
=[E,(6)} + V2(6) — 20E2(6) + 6 (10.5) 
Average now over first-stage selections. Since E 1E2(6) = 6, 
V(6) = E,LE2(6)) — 0° + E,V2(4)] (10.6) 
= V,[E2(6)]+ E,LV2(4)] (10.7) 


Formula (10.7) extends naturally to three o: more stages, For three-stage 
sampling, 
V(6) = Vi{ELE3(6)} + Ex{ VolLEs(6)}} + E {Esl V3(6))} (10.7') 


10.3 VARIANCE OF THE ESTIMATED MEAN IN 
TWO-STAGE SAMPLING 
The following notation is used. 
yy = value obtained for the jth subunit in the ith primary unit 


m 
=L 74 sample mean per subunit in the ith primary unit 


Jatt 


ey Z= over-all sample mean per subunit 
i=1 


NNE 
È (ọṣ -Ý 
sA ears ie variance among primary unit means 


2 (elie! ______= variance amon its within pri i 
Sz N(M~1) g Subunits within primary units 


Note that Y, denotes the total over-all subunits in the ith unit (denoted by y; in 
Chapters 9 and 9A). 
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Theorem 10.1. If the n units and the m subunits from each chosen unit are 
selected by simple random sampling, ¥ is an unbiased estimate of Ÿ with 
variance 


N-n) Si, (M-m) SÈ 


Viy)= ia ain A (10.8) 
Proof. With simple random sampling at both stages, 
= a e ye 
EU) = ElEW)l= (25 ¥) =(LE ¥) = ¥ 0.9) 
For V(¥), we use formula (10.2). 
V(¥) = ViLE2(9)]+ EL V2(9)] (10.10) 


Since E,(¥) =) Y;/n, the first term on the right is the variance of the mean per 
subunit for a one-stage simple random sample of n units. Hence, by 
Theorem 2.2, 


sè 


n 


-_(N- 
Vi[Ew)]= (~ (10.11) 


Furthermore, with f=) ¥i/n and simple random sampling used at the second 
stage, 


=) _ (M-m) È S? 
A a A (10.12) 


where S2?= £ ( TA ¥,)’/(M—1) is the variance among subunits for the ith 
primary unit. When we average over the first-stage samples, È S27/n averages to 
È 5.2/N= S22. ! 

; Hence 


~ (Mens? 
ELV- ( ra aa (10.13) 
The theorem follows from formula ( 10.10), on adding (10.11) and (10.13). 
If fi=n/N and f2=m/M are the sampling fractions in the first and second 
stages, an alternative form of the result is 
1- 


vobs? hiss (10.14) 
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10.4 SAMPLE ESTIMATION OF THE VARIANCE 


Theorem 10.2. Under the conditions of theorem 10.1, an unbiased estimate 
of V(¥) is 


= 34 =f; 2 fil -f) 2 10.15) 
A le ( 
where fı = n/N, f2=m/M, and 
ria ane ZE- Hi)? 
p=-20 2 see! ae (10.16) 
Proof. 
(n-En (10.17) 
Hence 


(ke Chs, ný- 150A As, (10.18) 


(n-1)Es) =E +5 —* 
where Ý, =y Y;/n. The last term on the right holds because subsampling is 
independent in different units and f =} ¥/n. Thus, 


-DEDE P- fE 5,2 (10.19) 


Multiplying by (1—f,)/n(n—1) and averaging over the first stage of simple 
random sampling, 


Boh,» 2 hs eds, (10.20) 


By comparison with (10.14) for V(¥), note that the term in S? is too small by the 
amount f;(1—f2)52 ?/mn. Since E,E>(s,) = S,’, an unbiased estimate of V(¥) is 
therefore 


= Tz 
ERRANEN TA 
Corollary. A result used later is that from (10.20). 
a h OS gan. Sy _S? 
2 sa 22! gs = 
Els) =S? += Sr S, a M (10.22) 


~ 
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It follows that an unbiased estimate.of S,” is 


sees fa 


m 


Notes on Theorem 10.2. 1f m = M, that is, f+ = 1, formula (10.15) becomes that 
appropriate to simple random sampling of the units. If n = N, the formula is that 
for proportional stratified random sampling, since primary units may then be 
regarded as strata, all of which are sampled. In this connection, two-stage 
sampling is a kind of incomplete stratification, with the units as strata. 

When f, = n/N is negligible, we obtain the simple result, 


(y)= Sv site Ps: Ù : 33 
v(¥) = AG) (10.23) 
Thus the estimated variance can be computed from a knowledge of the unit means 
only. This result is helpful when subsampling is systematic, because in this event 
we cannot compute an unbiased estimate of S°. But (10.23) still applies, provided 
that n/N is small. If n/N is not small, (10.23) overestimates by the amount 
f1S\7/n, as seen from (10.20) and (10.14). 


10.5 THE ESTIMATION OF PROPORTIONS 


If the subunits are classified into two classes and we estimate the proportion that 
falls in the first class, the preceding formulas can be applied by the usual device of 
defining yy as 1 if the corresponding subunit falls into this class and as zero 
otherwise. Let p; = a;/m be the proportion falling in the first class in the subsample 
from the ith unit. The two estimated variances s,° and s+? required for theorem 
10.2 work out as follows: 


oy (py, — Bp)? 
spat 
RSI 
5, eat i 
s% n(m-1) pa Pi 
where p = =p;,/n. Consequently, by theorem 10.2, 
r Lehi 22, Md=f) 2 
= pE + 
v(p) ine py P) HEE 1) > Pi (10.24) 


Example. In a study of plant disease the plants were grown in 160 small plots 
containing nine plants each. A random sample of 40 plots was chosen and three random 
plants in each sampled plot were examined for the presence of disease. It was found that 22 
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plots had no diseased plants (out of three), 11 had one, 4 had two, and 3 had three. Estimate 
the proportion of diseased plants and its s.e. The symbol ¢ denotes the frequencies 22, 11, 
4,3. 

We have N = 160, M =9, n =40, m =3. In finding s,” and s,”, it is convenient to work at 
first with the numbers of diseased plants (3p,) and the numbers of healthy plants (3q,). The 
calculations are set out as follows. 


Frequency 
3p; $ pigi pigi 3p Mpè 
0 22 0 o 0 0 
1 11 2 22 11 11 
2 2 8 8 16 
3 3 0 0 9 27 
40 30 28 54 
_ 32 op 28 
are Mp 
e 1 ( esr’) 
— 2 = -— | = 
Lo(p,-P) O) 54-7 / = 3.822 


30 
Zepa => = 3.333 


Hence, from the formula immediately before this example, 
(3)(3.822) n (2)(3.333) 
(4)(40)(39) (4)(3)(1600)(2) 


The proportion diseased is 0.233 with s.e. 0.045. The a i 
4 . „€. 0.045. pproximate formula s,/V/n, from 
(10.23), gives 0.049, a-reasonably good estimate considering that fh=4 v 


v(p)= = 0.00201 


10.6 OPTIMUM SAMPLING AND SUBSAMPLING FRACTIONS 


These depend on the type of cost function. If travel costs between units are 
unimportant, one form that has proved useful is 


C=c\n+c.nm 


The first component of cost, cın, is proportional to the number of primary 
units in the sample; the second, canm, is proportional to the total number of 
second-stage units or elements. From theorem 10.1, V(¥) may be written’ 

1 SA 1 
VG =1(s 2S2) LEN 
(¥) phot aM ee Oe ws (10.25) 


The last term on the right does not depend on the choice of n and m. Minimizing V 
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for fixed C, or C for fixed V, is equivalent to minimizing the product 


Las EEA 
(v+ts,)c=[(s2- 5) 4 hle, +eam) 


By the Cauchy-Schwarz inequality, 


= Vci/ca (10.26) 


provided S? >S? /M. Round Mop: to the nearest integer. If m is an integer such 
that m < M, <m +1, a slightly better rule for Mop Small is (Cameron, 1951): 
round up if m2,,>m(m + 1), otherwise round down. If mop: > M or if S? <S?/M, 
take m = M, using one-stage sampling. (The product (V+5S,7/N)C is a strictly 
decreasing function of m when S? <S,?°/M.) i 

The value of n is found by solving either the cost equation or the variance 
equation, depending on which has been preassigned. 

In most practical situations the optimum is relatively flat. An error of a few units 
in the choice of m produces only a small loss of precision, as the following example 
illustrates. Write 


S s (10.27) 
Example. Let 


then 
map = 1.3V10=4.1 


We will regard total cost as fixed and see how the variance of ¥ changes with m. N is 
assumed large. From (10.14). 


2 2 
v= 4 SE 
n nm 
a (s ene 
an ian (Gh 


eliminating n by means of the cost equation. This gives 


ray noes =e ) sae | 169) 
vişř)= d- at = 2 ——s 
(y) C (1+ am 1+ (10+m) 


mS,2 
Omitting the constant factor. the relative variance can be calculated for different values of 
m. Table 10.1 shows these variances and the relative precisions (with the maximum 


Precision for m =4 taken as the standard). 
For any value of m between 2 and 9, the loss of precision relative to the optimum is less 


than 12%, 
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TABLE 10.1 
RELATIVE VARIANCES AND PRECISIONS FOR DIFFERENT VALUES OF mM 
m= 1 2 3 4 5 6 1 8 9 10 


Rel. variance 29.59 22.14 20.32 19.92 20.07 20.51 21.10 21.80 22.56 23.38 
Rel. precision 0.67 0.90 0.98 1.00 0.99 0.97 0.94 0.91 0.88 0.85 


In practice, the choice of m requires estimates of c,/cz and S/S, or equival- 
ently S2/S,. Because of the flatness of the optimum, these ratios need not be 
obtained with high accuracy. If cı/c2 is known reasonably well and a value of m, 
say mo, has been selected, a useful table (Brooks, 1955) shows the range of values 
of S,?/S,? within which this mp gives a precision at least 90% of the optimum. 

The table was obtained as follows. For given cost, assuming N large, the relative 
precision of Mo tO Mop is found to be 

Vilmo) _ (Su Ver + SV ea 


V(F| mu) Suci +S: cat MoC2Su + c1S2°/mo ues) 


The set of values of $2/S,, for which this expression exceeds some assigned level L 
are those lying between the two roots 


Sa _Y+VYLO-L)(Vmi+y?/Vmo) nen 
Su (Ly7/my)-(1-L) ee 
where y*=c;/c>. 

Table 10.2, adapted from Brooks (1955), shows the lower and upper limits of 
Sa /Su for L=0.9. The wide interval between the lower and upper limits is 
striking in nearly all cases. Note that the range of my changes in different parts of 
the table. 

If we have a rough idea about the values of $,°/S,7 for the principal items in a 
survey, Table 10.2 may be used to select a value my. Note that if p is the 
correlation between elements in the same primary unit, as defined in section 9.4, 


the ratio S,°/S,° is nearly equal to (I—p)/p. A value of S,?/S,? as low as 1 
corresponds to p=0.5. This would be an unusually high degree of intraunit 
correlation. Similarly, p =0.1 gives $:°/S,? = 9, whereas p = 0.01 gives $3°/S," = 
99. a 


Example. Suppose that c,/c, is about 1 and that $,°7/S§ ? i i 
and 100 for the principal items. The columns c,/c, = | oe in eapertecap ie bemeen 7 
since this covers ratios from 4 to more than 100 (actually to 196) With jé = ea d the 
same desired range, the table suggests a value of my, somewhere AE ENS 15 b 20 
Further calculation from (10.29) shows that mo = 18 is best. This covers the range from 5 2 
to 84—not quite so wide as desired. pies i £ a 
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TABLE 10.2 


Limits FOR S/S,” WITHIN WHICH Mo Gives AT Least 90% OF THE MAXIMUM 
PRECISION 


to 
o 
t 


* > denotes “> 100.” 


When the cost of travel between primary units is substantial, a more accurate 
cost function may be 


C=e\n+e,Vn+conm (10.30) 


since travel costs tend to be proportional to Vn. If a desired value of V(j) has been 
specified, pairs of values of (n. m) that give this variance are easily computed from 
(10.25) for V(¥). The costs for different combinations are then computed from 
(10.30) and the combination giving the smallest cost is found. When cost is fixed in 
advance, Hansen, Hurwitz, and Madow (1953) give a method for determining the 
(n. m) combination that minimizes the variance and a table that facilitates a rapid 


choice, Note that their n is our m and vice versa. 
10.7 ESTIMATION OF m, FROM A PILOT SURVEY 


Sometimes estimates of S3 and S,* are obtained from a pilot survey in which n’ 
Primary units are chosen, with m' elements taken from each unit. This section 
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> 2 
deals with the choice of n' and m’. If s,? is the variance between unit means and $2 
is the variance between subunits within units, as defined in section 10.4, (10.22) 
gives 


Eeeh) 8 = 52452 (10.31) 
For the simple cost function cın + c2nm, we had 
S2 
As an estimate of m,,, from the pilot survey, (10.31) suggests that we take 
ear 
hon = eile aa e 003 


The estimate /,,, is subject to a sampling error that depends on the sampling error 
of the ratio s,7/s2*. From the analysis of variance it is known that m’s,“/s2° is 


distributed as 
g 2 
g (1 +m S) 
where F has (n'— 1) and n'(m'— 1) degrees of freedom, provided that the yj are 


normally distributed. This result leads to the sampling distribution of tiop, for 
given values of n’ and m’, that is, 


Vcı/c2 


Poe atale ala (10.33) 
mM Su 
F(1+ Se) 


Example. For the example in section 10.6, in which 


c1=10cn $= 1.38,, M = 1.3V10=4.1 


consider how well m,,, is estimated from a pilot sample with n'= 10 and m'=4, From 
(10.33), á 


b 6.324 _ 6.324 
” VFU1+(4/1.69]-1 V3.367F—1 


where F has 9 and 30 df. To find the limits within which Prop, Will lie 80% of the ti we 
have, from the 10% one-tailed significance levels of F, ý ote ae 


F 10 (9,30)= 1.8490, Foo (9,30) = 1/F 10(30, 9) = 1/2.2547 = 0.4435 
Substitution of these values of F gives 


lowerlimit, p =2.8; upper limit, op = 9.0 
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As shown in Table 10.1, any m in this range gives a degree of precision close to the 
optimum. Thus, with n’= 10, m'=4, the chances are 8 in 10 that the loss of precision is 
small with normal data. : 

The 80 and 95% limits for n’= 5, 10, 20 and m’= 4 appear in Table 10.3. With n'= 20, 
we are almost certain to estimate m,,, with precision close to the optimum. This is not so 
with n'=5. 


TABLE 10.3 
Lower AND UPPER LIMITS FOR Mont 
n 80% 90% 
5 2.5, œ 1.8, œ Š 
10 2.8,9.0 2.3, 00 
20 3.1,6.4  2.7,9.1 


If the ratio c,/c2 is the same in the pilot survey as in the main survey, the cost of 
the pilot survey will be proportional to c,n'+c2n'm’. Brooks (1955) gives a table 
of the values of (n’, m’) in the most economical pilot survey that provides an 
expected relative precision of 90% in the estimation of mop. Table 10.4 shows part 
of this table. 


TABLE 10.4 
PILOT SAMPLE DESIGNS HAVING AN EXPECTED RELATIVE PRECISION OF 90% 


32 64 


mE Eom 


D RE oS o 


102 
137 169 


The computations assume that N and M are large: the designs are conservative 
if fpc terms are taken into account. Note that no more than 10 primary units are 
required and that the designs are relatively insensitive to the ratio €1/c2. 


10.8 THREE-STAGE SAMPLING 


The process of subsampling can be carried to a third stage by sampling the 
subunits instead of enumerating them completely. For instance, in surveys to 
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estimate crop production in India (Sukhatme, 1947), the village is a convenient 
sampling unit. Within a village, only some of the fields growing the crop 1n 
question are selected, so that the field is a subunit. When a field is selected, only 
certain parts of it are cut for the determination of yield per acre; thus, the subunit 
itself is sampled. If physical or chemical analyses of the crop are involved, an 
additional subsampling may be used, since these determinations are often made 
on a part of the sample cut from a field. 

The results are a straightforward extension of those for two-stage sampling and 
aré given briefly. The population contains N first-stage units, each with M 
second-stage units, each of which has K third-stage units. The corresponding 
numbers for the sample are n, m, and k, respectively. Let yiju be the value obtained 
for the uth third-stage unit in the jth second-stage unit drawn from the ith primary 
unit. The relevant population means per third-stage unit are as follows. 


K MK NMK 
2 Yiju E Yiju ARA Yiju 
r Vint P= aR 


The following population variances are required. 


N EB, 
E- YY 
a = 
NM _ a 
LLY- Ý 
S= H 
2 N(M- 1) 


NMK ety. 
x 2 > (Vik r i) 
Sea j u 


NM(K-1) 


Theorem 10.3. If simple random sampling is used at all three stages, the 
sample mean y per third-stage unit is an unbiased estimate of Y with variance 


= i- POR Gi 1> 2 
via hs bsp fs. (10.34) 
n nm nmk ` 
where fı = n/N, f2 = m/M, f, = k/K are the sampling fractions at the three stages- 
Proof. Only the principal steps are indicated. Write 


9-Y=(F— Fam) + (Fam — ¥n)+(¥, -0 (10.35) 


where Ý, is the population mean of the nm second-stage units that were selected 
and Ý, is the population mean of the n primary units that were selected. When hie 
square and take the average, the cross-product terms vanish. The contributions of 
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the squared terms turn out to be as follows. 


z 2 = 3 2 
EG- Yan)? aa 


EÈ- ¥,) ==2 52 


Eğ,- fy = 
When these three terms are added, the theorem is obtained. 


Theorem 10.4. An unbiased estimate of V(¥) from the sample is 


MEAE ERAN Riis 
n nm nmk 


where 5,7, 82°, 53° are the sample analogues of S,7, S, S3?, respectively. 
Proof. This may be proved by the methods in section 10.4 or alternatively by 
showing that 


Elsi) =S, + e524 “hs, F (10.37) 


E 4 “ig, 


and E(s,°)=5;°. To obtain the first result, let ix denote the mean over the m 
second-stage units in the ith primary unit, given that all K elements were 
enumerated at the third stage. Let x be the mean of the n values fig. Then, from 
(10.22) for two-stage sampling, it follows that 


e[* Cita) | Fose 1 hg; 


aaa E (10.38) 
Now, if J; is the sample mean for the ith primary unit, write 
—9)= Fix IAG — Fim) -—(V —FeII (10.39) 


By first averaging over sam 
ples in which the first-stage and second-st! 
are fixed, it may be shown that F ciel 


1 n 2 
È = ER sig 2 =S 
rate Lii- -j = FH (10.40) 


and that the cross-product terms from (10.39) contribute nothing. This establish 
k ablishes 
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the result for E(s,7). That for E(s,”) is found similarly. Hence 


Eog (57+ bshs) 


fi -fh) E al AE fifo fs) ee 
eee k ss)+ nmk Ss 


-fiset hee hs? yyy (10.41) 
n nm nmk 


As with two-stage sampling, it is clear from (10.36) that if fı is negligible 
v(¥)reduces to 


2) 8° ÈO- 


y 0.42 
COR TT (10.42) 
This estimate'is conservative if f, is not negligible. 
With a cost function of the form 
C=¢\n+c2nm +ce3nmk (10.43) 


the optimum values of k and m are 
. S Jele TEE e Dui 
op = SV C C3 Mop = SV C/C 10.44 
MA OO SSM 


The extension of the results in this section to additional stages of sampling should 
sbe clear from the structure of the formulas. 


10.9 STRATIFIED SAMPLING OF THE UNITS 


Subsampling may be combined with any type of sampling of the primary units. 
The subsampling itself may employ stratification or systematic sampling. Variance 
formulas for these modifications can be built up from the formulas for the simpler 
methods. 

Results are given for stratified sampling of the primary units in a two-stage 
sample. The primary unit sizes are assumed constant for a given stratum but may 
vary from stratum to stratum. This situation occurs when primary units are 
stratified by size so that sizes within a stratum become constant or nearly so. 

The hth stratum contains N, primary units, each with M, second-stage units; 
the corresponding sample numbers are n, and m,. The estimated population 
mean per second-stage unit is 


2 N, Myn 


aia af 3 (10.45) 
Yst 53 N,M, X Wyn 
h 


h 
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where W, = N,M,,/=N,,M,, is the relative size of the stratum in terms of second- 
Stage units and y,, is the sample mean in the stratum. By applying theorem 10.1 
within each stratum, we have 


ke Si afl fin o 2 AE fon 2) 
VO) =E Wy (ie Sin Tni Sx, (10.46) 


'n 


where fin = Mn/ Np fon = Mp/ Mp. 
From theorem 10.2, an unbiased sample estimate is 


z 1- 2 fir 4 ; 
(Ys) =} metsa, ] (10.47) 


Corresponding variances for the estimated population total are obtained by 
multiplying formulas (10.46) and (10.47) by (2N,M,,)°. 


10.10 OPTIMUM ALLOCATION WITH STRATIFIED SAMPLING 


This deals with the best choice of the n, and the m,,. If travel costs between units 
are not a major factor, thë cost may be represented adequately by the formula 


CHL Cinna +E comp, (10.48) 
h h 


From (10.46), the variance may be rewritten as 


Vg) =x m+s $ S) Lis 21s ‘| 
Yst r h P lh M, nam, 2h N; 1h 


The quantity 
VVsr) +a(y Cinna +È Connymy — c) 
h h 


where A is a Lagrange multiplier, is a function of the variables n, and (n,m). 
Hence, to minimize V for fixed C, or vice versa, we have 


Ce (ore EET (10 49) 
Vein : 


W,S. 
nym, Va = = (10.50) 


C2h 
These give 


Son 
iy Sy 
h [Sn SaM, Cin/ Cop, (10.51) 
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The formula for optimum my, is exactly the same as in unstratified sampling 
[(10.26) in section 10.6]. 
From (10.49), since W, © Np, Mn 


n, oc NM Sit 
h yo 
Cih 


Since self-weighting estimates are convenient, we consider under what cir- 
cumstances the optimum allocation leads to a self-weighting estimate. From 
(10.45), it follows that Fy is self-weighting if n,m,/ N, My = fa = constant since. in 
this event, 


io 2 
2 2 2h 
where Sin =S —— 


(10.52) 
h 


EN, M,/nm) E E Yn LOD Yay 
gs kay pe et ne 
ae r NM, foX NM, 
PE Yhij 
MEEA A 
È nama 4 


The condition is, as might be expected, that the probability f, of selecting a subunit 
be the same in all strata. 


From (10.50), the optimum allocation gives 


MyM San p 
fon = CS (10.53) 
NiMy VCan 


Frequently c2,,, the cost per second-stage unit, will be approximately the same 
in large and small primary units; but S>, may be greater in large units than in 
small. However, since the optimum is flat, a self-weighting sample will often be 
almost as precise as the optimum. Note that this result holds even if the optimum 
sampling of primary units is far from proportional. 


EXERCISES 


10.1. A set of 20,000 records are stored in 400 file drawers, each containing 50 records. 
In a two-stage sample, five records are drawn at random from each of 8() randomly selected 
drawers. For one item, the estimates of variance were s,? = 362, s,? = 805, as defined in 
section 10.4. (a) Compute the standard error of the mean per record from this sample. (b) 
Compare this with the standard error given by the approximate formula (10.23) in section 
10.4. 

10.2 From the results of a pilot two-stage sample, in which m’ subunits were chosen 
from each of n’ units, it is useful to be able to estimate the value of V(jj) that would be given 
by a subsequent sample having m subunits from each of n units. Show that an unbiased 


estimate of V(¥) is 
np- (Mot) (py mn ie) 
N/n m m mN MN 
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where s,? and s,° are computed from the preliminary sample. Hint. Use theorem 10.1 and 
the result (10.22): 
2 5 1 Ss S57 
Els) =S- M a 

10.3 In sampling wheat fields in Kansas, with the field as a primary unit, King and 
McCarty (1941) report the following mean squares for yield in bushels per acre: s,7= 165, 
s,°= 66. Two subsamples were taken per field. For a sample of n fields, compare the 
variances of the sample mean as given by (a) the sample as actually taken, (b) four 
subsamples per field from n fields, (c) completely harvesting n fields. 

N and M may be assumed large and constant. In (c) assume that complete harvesting is 
equivalent to single-stage sampling (i.e., to having m = M). 

10.4 In the same survey, with two subsamples per field, the. mean squares for the 
percentage of protein were s? = 7.73, s, = 1.43. How many fields are required to estimate 
the mean yield to within + 1 bushel and the mean protein percentage to within +4, apart 
from a 1-in-20 chance in each case? Perform the calculations (a) assuming that two 
subsamples per field are taken in the main survey, (b) assuming complete harvesting of a 
field in the main survey. 

10.5 For the wheat-yield data in exercise 10.3, what is the value of c,/c, in a linear cost 
function if the estimated optimum m is 2? 


10.6 If m/M and n/N are both small and the cost function is linear, show that m= 2 
gives a smaller value of V(¥) than m = 1 if 


10.7 A large department store handles about 20,000 accounts receivable per month. A 
2% sample (m = 400) was verified each month over a 2-year period (n = 24). The numbers 
of accounts found to be in error per month (out of 400) were (in order of magnitude) 0, 0, 1, 
1, 2, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9,9, 10, 10, 13, 14, 17, the time pattern being erratic. 
From the results in section 10.5, compute s,? and s,*. Hence compute the standard error of 
p, as an estimate of the percentage of accounts that are in error over a period of a year, that 
would be obtained from verifying (a) 1200 accounts from a single month, chosen at 
random, (b) 300 accounts from each of four random months, (c) 100 accounts each month. 
Hint. Either use the formula in exercise 10.2 with m’ = 400 or obtain unbiased estimates of 
S,? and S, and use theorem 10.1. 


10.8 In planning-a two-stage survey it was expected that c/c, would be about 4 and 
that S,*/S,? would lie between 5 and 50. (a) What value of m would you choose from Table 
10.2? (b) Suppose that after the survey was completed it was found that c,/c, was close to 8 
and S,/S,? was about 25. Compute the relative precision given by your m to that given by 
the optimum m. (c) Make the same computation for c,/c, = 4, $37/5,2 = 100. 


10.9 If p is the correlation coefficient between second-stage units in the same primary 
unit, prove that 


1~p_ SP S? 


P [NAD/N]S SM S 
(This establishes a result used in section 10.6.) 

10.10 Show that if S,?>0, in the notation of section 10.6, a simple random sample of n 
primary units, with 1 element chosen per unit, is more precise than a simple random sample 


of n elements (n > 1, M > 1). Show that the precision of the two methods is equal if n/N is 
negligible. Would you expect this intuitively? 


CHAPTER 11 


Subsampling with Units of 
Unequal Sizes 


11.1 INTRODUCTION 


In sampling extensive. populations, primary units that vary in size are encoun- 
tered frequently. Moreover, considerations of cost often dictate the use of 
multistage sampling, so that the problems discussed in this chapter are of common 
occurrence. If the sizes do not vary greatly, one method is to stratify by size of 
primary unit, so that the units within a stratum become equal, or nearly so. The- 
formulas in section 10.9 may then be an adequate approximation. Often, how- 
ever, substantial differences in size remain within some strata, and sometimes it is 
advisable to base the stratification on other variables. In a review of the British 
Social Surveys, which are nationwide samples with districts as primary units, Gray 
and Corlett (1950) point out that size was at first included as one of the variables 
for stratification but that -another factor was found more desirable when the 
characteristics of the population became better known. 

Some concentrated effort is required in order to obtain a good working 
knowledge of multistage sampling when the units vary in size, because the 
technique is flexible. The units may be chosen either with equal probabilities or 
with probabilities proportional to size or to some estimate of size. Various rules 
can be devised to determine the sampling and subsampling fractions, and various 
methods of estimation are available. The advantages of the different methods 
depend on the nature of the population, on the field costs, and on the supplemen- 
tary data that are at our disposal. 

The first part of this chapter is devoted to a description of the principal methods 
that are in use. We will begin with a population that consists of a single stratum. 
The extension to stratified sampling can be made, as in preceding chapters, by 
summing the appropriate variance formulas over the strata. For simplicity, we 
assume at first that only a single primary unit is chosen, that is, that n = 1. This case 
is not so impractical as it might appear at first sight, because when there is a large 
number of strata we may achieve satisfactory precision in estimation even though 

- n, =1. In the monthly surveys taken by the U.S. Census Bureau to estimate 
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numbers of employed people, the primary unit is a county or a group of 
neighboring counties. This is a large unit, but it has administrative advantages that 
decrease costs. Since counties are far from uniform in their characteristics, 
stratification is extended to the point at which only one is selected from each 
stratum. Consequently, the theory to be discussed is applicable to a single stratum 
in this sampling plan. 

As in preceding chapters, the quantities to be estimated may be the population 
total Y, the population mean (usually the mean per subunit %), or a ratio of two 
variates. 

Notation. The observation for the jth subunit within the ith unit is denoted by 
yi. The following symbols refer to the ith unit. 


Population Sample 
Number of subunits M, m; 
Mean per subunit Y, SA 
Total i Y, =MY, y= my 


The following symbols refer to the whole population or sample. 


Population Sample 
N n 
Number of subunits M=} M; Em 
N n 
Total =Y Ly 
Mean per subunit Y= Y/M, ¥=Ly/Lm, 
Mean per primary unit Y=Y/N ï=} y/n 


11.2 SAMPLING METHODS WHEN n =1 


Suppose that the ith unit is selected and that it contains M; subunits, of which m; 


are sampled at random. We consider three methods of estimating Y, the mean per 


subunit, or second-stage unit. as it is often called. 
I. Units Chosen with Equal Probability 
Estimate = j; = Jj. 


The estimate is the sample mean per subunit. Itis biased, for in repeated sampling 
from the same unit the average of y; is Y;, and, since every unit has an equal chance 
of being selected, the average of Y; is 
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But the population mean is 


Tmz 
x 


7 N 
Ga Tag A whereM= X.M, 


Hence the bias equals (f, — Y). Since the method is biased, we will compute the 
mean square error (MSE) about Y. Write 


I- L=(9,-Y+(¥,- F)+H(%-/ 
Square and take the expectation over all possible samples. All contributions from 


cross-product terms vanish. The expectations of the squared terms follow easily 
by the methods given in Chapter 10. We find 


D 1 & (M=m)'527 I R.s aiue e 
MSE(¥,) =— Y ——— + +— ,- Ya)? +Y, +Ý” Lid 
Jy Ney M; mi ta Ya) +(Ya ) ( ) 
where y 
T E P E 
Szi Smh Y;) 


is the variance among subunits in the ith unit. 

The MSE of f, contains three components: one arising from variation within 
units, one from variation between the true means of the units, and one from the 
bias. 

The values of the m; have not been specified. The most common choice is either 
to take all m; equal or to take m; proportional to M; that is, to subsample a fixed 
proportion of whatever unit is selected. The choice of the m; affects only the first 


of the three components of the variance—the component that arises from 
variation within units. 


Il. Units Chosen with Equal Probability 


Estimate = f = nig 


This estimate is unbiased. Since J; is an unbiased estimate of Y;, the product 
M,j, is an unbiased estimate of the unit total Y;. Hence NM,j, is an unbiased 
estimate of the population total Y. Dividing by Mg, the total number of subunits in 
the population, we obtain an unbiased estimate of Y. 

To find V(Fu) which, of course, equals its MSE, we have 


Yu- a 2 
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Now M; Y, = Y, the total for the unit, and Ý =NY/Mp, where Ý is the population 
mean per unit. This gives 


NM, aN, x 
Siete =—(j,- Ý, Raa Y;- 
Ju NG y ) M, Y) 
Hence 
N N Sot ANEN K 
Viu =- } M(M,-—m,) =+-— ¥ (Y-F :2 
Gu) Mo ( mi Mod | ) Na 


The between-units component of this variance (second term on the right) 
represents the variation among the unit totals Y;. This component is affected both 
by variations in the M; from unit to unit by variations in the means Y; per element. 
If the units vary considerably in size, this component is large, even though the 
means per element Y, are almost constant from unit to unit. Frequently this 
component is so large that fı has a much higher MSE than the biased estimate J. 
Thus neither method I nor method II is fully satisfactory. 


III. Units Chosen with Probability Proportional to Size P 
Estimate = fiu = J; = sample mean 


This technique is due to Hansen and Hurwitz (1943), It gives a sample mean 
that is unbiased and is not subject to the inflation of the variance in method TI. 

In repeated sampling, the ith unit appears with relative frequency M;/ Mo. 
Hence 


Furthermore, 
fu- Y= (Fu- ¥)+(¥,- Ý) 
Average first over samples in which the ith unit is selected. 
z 2_ (Mi-m)\ S37. = 
E( 2 i ( i ves pal 2 
PY Y) TM P +Y- 


Now average over all possible selections of the unit. Since the ith unit is selected 
with relative frequency M;/Mo, 


Von- | È -mS $ mir,- Hr] 

=— A) (Y¥,- Ë? 

Yur Maley ATE m ae Cie) 
Note that, as in method I, the between-units component arises from differences 

pa the means per subunit Y; in the successive units. If these means per subunit 
© Nearly equal, this component is small. 
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TABLE !1.1 
ARTIFICIAL POPULATION WITH UNITS OF UNEQUAL SIZES 
Unit Wii M, Y, Sat- fi tem FF 
| 0,1 2 1 0.500 0.5 —2.25 
2 $5252, 3) 4 8 0.667 2.0 —0.75 
3 3, 3,4, 4,5, 5 6 24 0.800 4.0 +1.25 
Totals 12 33 


Example. Let us apply these results to a small population, artificially constructed. The 
data are presented in Table 11,1. There are three units, with 2, 4, and 6 elements, 
respectively. The reader may verify the figures given for Y, S., and Y,. The population 
mean Ý is #3, or 2.75. The unweighted mean of the Ý, is 2.167 = Ē, so that the bias in 
method I is —0.583. Its square, the contribution to the MSE, is 0.340. 
One unit is to be selected and two subunits sampled from it. We consider four methods, 
two of which are variants of method I. 
Method la. 4 
Selection: unit with equal probability, m, = 2. 
Estimate: y, (biased). 

Method Ib. t 
Selection: unit with equal probability, m, = įM,. 
Estimate: J, (biased). 
Method Il. 
Selection: unit with equal probability, m, =2. . 
Estimate: NM,y,/M, (unbiased). 

Method III. x 
Selection: unit with probability M,/Mo, m; = 2. 
Estimate: y, (unbiased). 

Method Jb (proportional subsampling) does not guarantee a sample size of 2 (it may be 1, 
2, or 3), but the‘average sample size is 2., 

By application of the sampling error formulas (11.1), (11.2), and (11.3), we obtain the 
results in Table 11.2. 


TABLE 11.2 

MSE’s OF SAMPLE ESTIMATES OF Y 
Contribution to MSE from Total 
Method Within Units Between Units Bias MSE 
la | 0.144 2.056 0.340 2.540 
Ib , 0.183 2.056 0.340 2.579 
Il 0.256 5.792 0.000 6.048 
Ul 0.189 1.813 0.000 2.002 


Although the example is artificial, the results are typical of those found in 
comparisons made on many populations. Method III gives the smallest MSE 


ax 


"9 
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because it has the smallest contribution from variation between units. Method II, 
although unbiased, is very inferior. Method Ia (equal size of subsample) is slightly 
better than method Ib (proportional subsampling). 

Some comparisons of these methods have also been made on actual popula- 
tions. For six items (total workers, total agricultural workers, total nonagricultural 
workers, estimated separately for males and females), Hansen and Hurwitz 
(1943) found that method III produced large reductions in the contribution from 
variation between units as compared with the unbiased method II, and reductions 
that averaged 30% as compared with method I. (They assumed the contribution 
from variation within units to be negligible.) In estimating typical farm items for 
the state of North Carolina, Jebe (1952) reported reductions in the total variance 
of the order of 15% as compared with methods of type I. In both studies the 
primary unit was a county. ý 


11.3 SAMPLING WITH PROBABILITY PROPORTIONAL 
TO ESTIMATED SIZE 


As mentioned in Chapter 9, the sizes M; of the units are sometimes known only 
approximately from previous data, and in other surveys several possible measures 
of the size of a unit may be available. Let z; be the probability or relative size 
assigned to the ith unit, where the z; are any set of positive numbers that add to 
unity. We still assume n = 1. 

Method IV. An unbiased estimate of Y is 


Jiv =M (11.4) 


This follows because, in repeated sampling, the ith unit appears with relative 
frequency z; so that 


E(f) = 2 X (2k M, nipas ľ 
0 


The variance of jy is obtained in the usual way. Write 


Jv- P= AL ř [by (11.4)] 


-alfo - Yo +(¥-mo?)| 


In the variance, each square receives a weight z;. Hence 


vonga l re Si, Ea m)] (11.5) 


Zi m; 
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If z;=M,/Mo, (11.5) reduces to (11.3) for V(Jn). If z;=1/N (initial prob- 
abilities equal), (11.5) reduces to (11.2) for the variance of the unbiased estimate 


when probabilities are equal. 
Unless z; = M;/Mo, the between-units component in (11.5) is affected to some 


extent by variations in the sizes M; as well as by variations in the means per 
element Y, 


TABLE 11.3 
COMPUTATION OF V(¥y\) 
j M(M; — Yay 
Unit M; elt Ne Ru ag LINCS Xe Zeny 

zm; CVF 
1 2 0.17 C2 2 0 0.500 1 5 —28 
2 4 0.33 0.4 2 10. 0.667 8 20 -13 
3 6 0.50 04 2 30 0.800 24 60 +27 


—  SsS—XKX SSS 


Example. Table 11.3 shows the computations for finding V(j,y) in the artificial 
population in Table 11,1. The z; have been taken as 0.2, 0.4, and 0.4, and the m; = 2. 
From (11.5), the variance comes out as follows: 


M(M,- mi) Sor, 
zm, 


within-units contribution = 5 Mè = 0.213 


2 
between-units contribution = F, a ec r) /M, =3.583 
Zi 


Comparison with Table 11.2 shows that method IV has a lower variance than 
the unbiased method II in which the primary unit is chosen with équal prob- 
abilities, but method IV is decidedly inferior to method I or method III. In this 
example method IV pays too high a price in order to obtain an unbiased estimate. 

Consequently, it is natural to consider whether the sample mean (as in method 
I) would be better than the estimate adopted in method IV. 


V. Units Chosen with Probability Proportional To Estimated Size 
Estimate = yy = J; = sample mean 
The estimate is biased since, for example, 
E(¥) = zY, =Ý, 
If the z; are good estimates, Y, is close to the correct mean Y=). M,¥;/Mo and 


_ the bias is small. 
If we write 


Wv-Y=(5-VY)+(%-K)+(%-H 
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the three components of the MSE work out as follows. 
X z(M; — mi) S2i7 


N G 2 
MSEQW) = Tyg om m et Ye) + (Ke PY (11.6) 


Example. If the values of z; and m, are chosen as in Table 11.3, the reader may verify 
that the components of the variance of y are as shown in Table 11.4. 


TABLE 11.4 
CONTRIBUTIONS TO THE MSE IN METHOD v 
Within Between Bias Total 
Units Units MSE 
0.173 1.800 0.062 2.035 


This is superior to all methods except method III (pps) and is almost as good as 
method III. 


11.4 SUMMARY OF METHODS FOR n=1 


« The five methods of estimating the mean per element ¥ and their MSE’s in the 
numerical example are summarized in Table 11.5. 
For estimating the population total Y = Mo, the preceding estimates are 
multiplied by Mo and their MSE’s in Table 11.5 by Mọ’ = 144. Their relative 
performances remain the same. 


TABLE 11.5 
Two-STAGE SAMPLING METHODS (n = 1) 
Probabilities in Estimate Bias MSE 
| Method Selecting Units of ¥ Status in Example 
| Se 
| I Equal Ji Biased Ia: 2.541 
16: `2.579 
NM;iği 
Equal Sa Unbiased 6.048 
M; K = 
I — size Ji Unbiased 2.002 
Mo 
n * Mia; P 
\ IV z; © estimated size Unbiased 3.796 
ziMo 


i y z; œ estimated size Ui Biased 2.035 
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11.5 SAMPLING METHODS WHEN r>1 


For n>1 the principal sampling methods are natural extensions both of the 
preceding methods I to IV and of the methods discussed in Chapter 9A for 
one-stage sampling with cluster units of unequal sizes. The quantities most 
commonly estimated are population totals, population means per subunit, and 
means or proportions that have the structure of ratio estimates. 

in many applications the quantity Mo, the total number of subunits in the 
population, is not known, but only the M; for the sample primary units that are 
drawn, since these M, can be counted by listing. It is worth noting, therefore, that 
methods II and IV and their extensions to n > 1 do not require knowledge of Mo 
for estimation of the population total. Method I and its extensions do not require 
knowledge of Mo for estimating the mean per subunit. Method III, probability 
proportional to size, requires knowledge of Mo. ; 

Estimates are called self-weighting when they are simply a multiple of the total 
over all subunits in the sample. In view of the practical convenience of self- 
weighting estimates in large-scale surveys, the condition under which each 
estimate becomes self-weighting will be noted. For unbiased estimates the 
condition is, as will be seen, that every second-stage unit in the population or, 
more generally, every unit of the final stage of sampling, has an equal chance of 
being drawn. 


11.6 TWO USEFUL RESULTS 


In finding variances and sample estimates, two general results are useful. They 
show how to extend one-stage sampling results to two-stage or more generally 
multistage sampling. They were first proved by Durbin (1953) for estimators that 
are sums of estimators for the sample primary units, extended by Des Raj (1966) 
to linear functions of the primary unit estimators, and extended to still more 
complex cases by Rao (19756). We will follow the approach used by Des Raj. 

Primary units are chosen without replacement, with equal or unequal prob- 
abilities. In the ith primary unit, let Y; be an unbiased estimator of the unit total 
Y;, with second-stage variance Tx: Subsampling is independent in different 
primary units. Consider an unbiased estimator of the population total Y of the 
form 


Y= > w,¥, (11.7) 


where the weight w;s is known for every sample s, and may depend on other 
primary units that are in the sample as well as on uniti. 

We will adopt the device used in section 2.9 of letting w;,'be a random variable 
that equals w,, if unit i appears in the sample and equals zero otherwise- This gives 


a 


N 
P= 3 w,'v, (11.8) 
i=l 


—, 
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Now 


4 s N 
E(®=BEP=E( X w'Y)=Y (11.9) 


if and only if E,(w;s') = 1 for every i. 
Theorem 11.1. 


A n a n N 
V= vh È w?) = (È wey.) + È Elmo? a10 
Proof. By formula (10.2), 
V(¥) = VLE ®]+E LVA] 


since the second-stage covariance between y, and Ŷ (i #j) is zero, because 
subsampling is independent. Hence 


v= UE weYi)+ È [Ewo] < L2) 


This completes the proof. 
The first term in (11.12) is the variance of the corresponding estimator when 


every primary sample unit is completely enumerated, and it is obtained from 
results in single-stage sampling. 


Example. For the two-stage analogue of the Horvitz-Thompson estimator, Ŷ;rr = 


£ Ý/m, the weight w,'=1/7; if unit i is in the sample and zero otherwise. Hence, 
E,(w,,"*) = m/m? = 1/7, where m is the probability that primary unit i is drawn. Further- 


more, if m, subunits are drawn from the M, by simple random sampling whenever unit i is 
drawn, 


o= vatns? (11.13) 
i 
Hence, by theorem 11.1, using formula (9A.42) for the single-stage analogue of V(Yir7), 
we have 
a EN Y, EON = 2 
VVur) = E Lim m)(~—-%) + 5 MM = m) (11.14) 
i=1j>i Ti Tj i=1 MT; 


Theorem 11.2. Suppose that we have an unbiased estimate G27 of the 
ap -stage variance 0-2; ? of Y;, and an unbiased sample estimate of ve Wis Yi) = 


us wij.’ Y;) from one-stage sampling. The latter will be a quadratic of the form 


(È wY) =} aY; +2) Y bys VY, (11.15) 
i i . i j>i 
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Then an unbiased sample estimator of V(Y wis Y;) is 


fE wY) = (Eat? +28 En) Ew ao 
i J 


i isi 


i 


Thus the rule for constructing the sample estimator of V(Yw,¥;) is: In an 
unbiased estimator of V(X wi, Y;) from one-stage sampling, insert Y, wherever Y; 
n 


appears. To this add the term È (wêz), where ¥ wf, = Y, and 622 is an 
unbiased estimator of V2(Y;). 
Proof. 


N ; N 2 NN 
VUE wal Yi) =È V2 Viw) +28 È YY, cov (waw) 117) 
i i 7 j>i 


Introduce the random variable, a;,', where a;,' = a;, if unit i is in the sample 


and a;'=0 otherwise. Similarly, let bijs = b;j if units i and j are in the sample and 
bijs = 0 otherwise. From (11.15) for one-stage sampling, 


N r: N 2 N N 
(2 Wis Y) = as'Y,; +22 X bis VY; (11.18) 
i i j>i 


If this is to be unbiased, comparison of (11.18) and (11.17) shows that we must 
have E (ais) = V(wi,'). Now for our variance estimator (1 1.16) 


N da NN Soe N 
E.E,(S ais Y; +2 Sbp) +E EÈ wa'êni”) 
i i j>i 


N 2 N N N 
=E,(5 a, ¥?+2¥ È bp ¥i¥,) +È VG!) +E? (wa Nora? (11.19) 
i i j>i i 


In (11.19) we have used the results that E(a;,')= V(w;') and that for any i, 
E,(wis')= 1 = E,“(wa'). Continuing, 


HEG ms?) = us wY.) +È Elwa oa? = vis Wis ¥) (11.20) 


by (11.12). This completes the proof. 

We consider now some specific estimators of the population total Y. In 
extensive surveys, selection of primary units with unequal probabilities has 
become the most commonly used technique. Sections 11.9 (sampling with 
replacement) and 11.10 (sampling without replacement) deal with these methods; 
section 11.13 presents the ratio estimator in ppz sampling. Other sections (11.7 


and 11.8) give the corresponding methods when primary units are chosen with 
equal probabilities. 
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11.7 UNITS SELECTED WITH EQUAL PROBABILITIES: 
UNBIASED ESTIMATOR 


Except where noted otherwise, the m; sample subunits in the ith unit are chosen 
by simple random sampling. The unbiased estimator of the population total is 


- Na NE a 
Kic My, =—Y ¥, (11.21) 
n n 
To apply theorem 11.1, note that 
IN nut N_ mAN N 
We= Elw) =7, =i E(w) Na Oe 


Hence, by theorem 11.1 and earlier results, 


LAN NS MÈU — fa:)S2? 


2 
v=% aA) pene = 


(11.22) 


where fz; = m;/M;. The estimator becomes self-weighting if f2; is constant, (= fo, 
say). We then have 


Y= LY vy (11.23) 
The quantity nf2/N is, of course, the probability that any second-stage unit is 
drawn. 


For an unbiased sample estimate of variance, theorem 11.2 gives, from (11.16), 


ÈY- Ê)? 


EMS M7 (1=fai)s27? 


_NU-f) 
o(Y,)= z ee A (11.24) 


11.8 UNITS SELECTED WITH EQUAL PROBABILITIES: 
RATIO TO SIZE ESTIMATE 


This estimator of the population total Y is 


7 My; ) 4 
Pr = MyM yet (11.25) 
ÈM SM, 


This is a typical ratio estimator, since both numerator and denominator vary 
from sample to sample. It is used mainly for estimating means per subunit, for 
which knowledge of Mo is not required, and for which it is an extension of method 
I with n= 1. 
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To find its approximate MSE, write 
Ë My, «Nea ; 
3 RYM -y= > MG; Ý) (11.26) 
XM; 


where Y = Y/Mo. 


Since Y,, =(N/n) $È Mif» we can obtain the approximate MSE(¥z) from for- 


mula (11.22) for V(¥,,) by substituting M,(¥,— ¥) for My, or, more generally, 
(yy— Ý) for yy. Now (11.22) is 


7 i Ny yy? fs 
V0) = Nf EET NS MIL fa) Sa? 


In the substitution, Y,=M;Y, becomes M,(Y;-—¥) while Y S yı/ N is 
; i i 
replaced by 0. Also S3; =} (yy — ¥;)?/(M, —1) remains unchanged when (y;; — Y) 
replaces y,. This gives the result 


MSE( Pq) (1 py AMAR- WO" NS MPI fu) Su? 


1 ra (11.27) 


As with Y,,, this estimator becomes self-weighting if 


m m_Nm 
fa = = constant = fz = = =— 
M, aay Mo 


In this event the within-units contribution ma i 
; st y be expressed more simply b; 
putting m; = NmM;/Mb, giving R cli 


N 
2 27> sa) 2 
MSE(¥p) = X a-p MAT- IY Mia- a (m s2 
n (N- 1) nm M.) °” 

(11.28) 

The resemblance to the corresponding formula when primary units 
i f 

sizes may be noted. From (10.14) section 10.3, multiplying by M T oan 
we have ; oy, 


N 
EI pe Z 2 
vý- (1-fi) x (¥%i- ¥) +MlU AS (1) 2 


n DE ANE IE Sonne AN 


(11.29) 


The difference is that in (11.28) the contributions to the MSE from the primary 
units are weighted, larger units receiving greater weight. 
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By theorem 11.2 an approximate sample estimate of MSE( r) in (11.27) is 
given by 


3 M? (y: = ÎR Na MP1 = fri)Sai° 


n-1 n m; 


A Sig 
v(YrR) =z 0A) (11.30) 
en n > 
where Yr =~ Miyi/X M; = Yr/Mo. 
When this estimator is used to estimate the population mean per subunit, we 
have Êk =F. Ŷ;/$, M; and v(¥p)=0(Yx)/Mo*. When Mo is not ‘known, we 
substitute Mo = N(¥ M,/n) for Mo in calculating v (¥a)s 


When primary units are selected with equal probabilities, an alternative esti- 
mate of the population mean per subunit (another extension of method I) is 


les A = 
= APRE “+ ¥n) 


This estimate is self-weighting if m; = constant, as in the following example. When 
m; and Y; are uncorrelated, this estimate may be satisfactory, but it is liable to a 
bias that does not vanish even with n large. 


Example. From the volume American Men of Science, 20 pages were selected at 
random. On each page the ages of two scientists, from two biographies also selected at 
random, were recorded. The total number of biographies per page varies in general from 
about 14 to 21. Estimate the average age and its standard error from the data in Table 11.6, 


using the ratio estimate. 
From the extreme right column, 


A Ml LE 47.7 years 


=M, 359 

Since n/N is negligible, we have from (11.30), dividing by My’, 
x yA ey 
v(Yx) = X MG, Ya) 
nM’(n—1) 
The numerator is computed as 
£ 
EMY) -29r X (MJ;)M + Ýr? È M? 
= 15,375,020- (95.3844)(309,747.5) + (2274.55)(6481) = 571,300 

Since M, = 359/20, as estimated from the sample, this gives 


(20)(571,300) _ 
(19)(359)* 


s(¥x) =2.16 years 


v( Fx) = 4.67 
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TABLE 11.6 
AGES OF 40 SCIENTISTS IN American Men of Science (n = 20, m = 2) 
Ages 
Unit — Total 
No. M; Yi Yiz Yi Mia; 

1 15 47 30 77 577.5 

2 19 38 51 89 845.5 

3 19 43 45 88 836.0 

4 16 55 41 96 768.0 

5 16 59 45 104 832.0 

6 19 39H 38 77 731.5 

T 18 43 43 86 774.0 

8 18 49 51 100 900.0 

9 18 45 35 80 720.0 
10 18 46 59 105 945.0 
11 20 71 64 135 1,350.0 
12 18 35 46 81 729.0 
13 19 61 54 115 1,092.5 
14 19 45 87 132 1,254.0 
15 18 31 38 69 621.0 
16 16 64 39 103 824.0 
17 16 63 47 110 880.0 
18 19 36 33 69 655.5 
19 19 61 39 100 950.0 
20 19 54 34 88 836.0 
Totals 359 ye 1,904 17,121.5 


11.9 UNITS SELECTED WITH UNEQUAL PROBABILITIES WITH 
REPLACEMENT: UNBIASED ESTIMATOR 


Primary units are selected with probabilities proportional to z; with replace- 
ment. Results for z; = M;/Mo (probability proportional to size) follow as a special 
case. 

The subsample of m; subunits from the ith unit is assumed to be randomly 
drawn without replacement. If the ith unit is selected more than once, we suppose 
that on each selection the whole subsample is replaced, a new independent 
drawing of m; units being made without replacement from the complete unit. 

An unbiased estimate of the population total is 


¥,, <1 Mails Y, (11.31) 
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For n=1, it was shown in section 11.3 that this estimator, Moy,y= Yiv, is 
unbiased. Its variance is obtained from formula (11.5) on multiplying by Mo? as 


A Pec N Yi 2 N M; (M; — m;)So77 
P= al Y) D (11.32) 


With this method of sampling, the estimator ae is the mean of n independent 
estimates of the form Y;y. Consequently, from classical sampling theory Ypp: is 
unbiased and 

Lie y 1X (MX A AN MAPISA 
VÊ) =— WY) == E a(%- Y) Fas (= far)Sa (11.33) 
n niai \Z; n i=1 M;Z; ` 


Furthermore, given n independent estimates Yw= Y;/z; an unbiased sample 


estimator of V( Yiv) is, of course, 
n/Y, 2 
È Ss Fore) 
Zi 


_i=l 
o(Yiv)= CEND (11.34) 
Consequently, an unbiased sample estimator of V( Lone) is the very simple. 
expression 
Y, y 
o(Y, AG ts (11.35) 
pee n(n—1) i 


These results hold also for multistage sampling, provided that Y; is an unbiased 
estimator of Y; and that subsampling is independent whenever a primary unit is 
drawn. i 

To discover when È becomes self-weighting in two-stage sampling, write 


3 12M, ™ 
Yppz ==} — bi 
! ppz PO 2 Yi (11.36) 
The condition is therefore 
nzm; 
T =constant = fo (11.37) 


This expression is the probability that any specific second-stage unit is drawn. 
From (11.37), m;/M; = fo/nz;. If the probability fo is chosen in advance, the field 
worker can be told in advance what second-stage fraction m,/M, to take from any 
Primary unit that has been chosen. For instance, suppose fy = 1/50=0.02 and 
"= 60 primary units have been chosen. If z; = 0.0026 for one selected unit, we 
must have m;/ M; = 0.02/(60)(0.0026), or 1 in 7.8. 
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With a self-weighting sample, the variance estimator (11.35) takes the simpler 
form 


ofn) Bopper FP (11.38) 


where y; =) yy is the sample total in the ith unit. The simplicity of the estimated 


J 
variances (11.35) and (11.38) is an attractive feature of with-replacement 
sampling. 
With this method there are other ways in which the subsamples may be drawn. 
If the ith unit is selected t; times, one variant is to draw a single subsample of size 
tm; without replacement, provided that M; > m;t Sukhatme ( 1954) has shown 


N 
that with this method, V( Y) is reduced by (n — 1) ¥ M,S27/n. Another method 
is to draw a single subsample of size m; no matter how many times the ith unit is 
selected. The estimate M,y,/z; from this unit recéives a weight t; (the number of 
times that the unit has been drawn) with either method. The effect is to increase 
vi Yipes) by 
(=) MII -AaS 
n Mi 


For the same cost, the differences in 
likely to be substantial. 


If z; = M;/Mo the unbiased estimate (11.31) reduces to 


precision among these methods are seldom 


A Ma” 
Yop a EN (11.39) 


Clearly this estimate becomes self-weighti Ý 
i - ting when m; = = Mo. 
An unbiased sample estimator is or UATE SAR TP j 


A, pt Mo Bf Y. 2 
0 Fon) ae iG) (11.40) 


11.10 UNITS SELECTED WITHOUT REPLACEMENT 


For any of the “without-replacement” methods studied in Chapter 9A, the 
formulas for the variance and estimated variance in two-stage sampling are 
obtained from theorems 11.1, 11.2. With the Brewer or Durbin methods 


-Mñ iaMj 12°F, 
eia gD a (11.41) 


n Zi n Zi 

with variance 
ETE E Y 2 2 

WPEÈ Siama- R MUAS o aran 


i j>i j i MT; 


uj 
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: 


For the corresponding estimator A z in sampling with replacement, we have, 
from (11.33), with 7; =nz;, 


i, N /Y, 2 NM7(1—f,-)S.2 
V Lp) = 7 Y) D (11.43) 


Since the “within-units” contributions to the variance are the same in (11.33) 
and (11.43), any relative gain in precision from selecting units without replace- 
ment is watered down in two-stage samples by the within-units contribution to the 
variance. For instance, in a stratified multistage sample of n = 44 provinces out of 
N= 147 provinces, Des Raj (1964) found that the. ratio V( ae )/ V(Ywer) 
averaged about 0.79 over seven items for the between-units component, but the 
average ratio over both components was 0.92. 

With n = 2 an unbiased sample estimate of V( Yz) is 


F. Mọ. ¥o\2 2 M?(1—f,;)s>,7 
60! Cremer = n( 1 Maa) +f Mi a fri)S2i 
Ti T2 i=l MT; 


(11.44) 


where suffixes 1, 2 denote the chosen units. i 
The extension to the Sampford estimator for n>2 is straightforward. The 
alternative RHC estimator 


“ n Z.M,y, 
Yruc ey y (11.45) 
8 g 


where Z, = z; over the gth group and M,, Je and z, refer to the unit drawn from 
the group. For a sample of n units an unbiased estimator of V( Yruc) is 


(è N-A) t ; 3 
a s M, A 2 Br MAE SoS 
Yy, ESR SLY ee (Mads = ) =L 2g 2g 
v(Y ruc) ( N? 5 NA) z Zz Ze Yruc +h Z; Zg Me 


(11.46) 


Formulas a 1.44) and (11.46) require separate calculation of the between-units 
and the within-units contributions to the estimated variance. In extensive surveys 
with many strata and numerous items, the complexity of such formulas makes the 
estimation of variances a task requiring more computer time than can be devoted 


toitin practice. recall the much simpler form of the variance formula (11.35) when 
units are drawn with replacement; that is, 


> A 
pial eo Ee ) 
v(Yppz) n(n—1) x z Yppz 
where y," = M,y;/z;. This result makes with-replacement sampling appealing if it 
does not involve too much loss of precision. 
Among without-replacement methods, Platek and Singh (1972), in planning 
the redesign of the Canadian Labor Force Survey for strata in which n, =6 ora 
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multiple of 6, point out the advantages of Vere and the Hartley-Rao version of 
systematic sampling (section 9A.10) in simplicity and variance estimation. 

For many small strata with n, = 2, Durbin (1967) has produced methods that 
give simple variance estimation when units are selected without replacement. He 
uses Vir and seeks methods for which 7; = 2z;. Within a stratum, the strategy is to 
use selection methods that make (mary — 1) either 1 or 0. These choices enable 
v(Yjrr) in (11.44) to reduce either to the term 


Gre 
mi m 


or to the second term in (11.44). Brewer and Hanif (1969) have improved and 
extended the methods to estimators other than Ý;rr- 


11.11 COMPARISON OF THE METHODS 


In one-stage sampling (section 9A.12), equal-probability sampling with the 
_ unbiased estimate Y,, was compared with the corresponding ratio estimates and 
with various unequal-probability sampling methods. Roughly, the ratio of the 
variance of Y, to that of another method that sampled without replacement was 
equal to the ratio of the coefficient of variation of the unit total Y; to that of Y;/z;. 
As regards unequal-probability sampling with and without replacement, the ratio 
V(Ywor)/V( Ywr) roughly equaled (N —n)/(N — 1), the same figure as in equal- 
probability selection. 

In two-stage and multistage sampling the net effect of the within-units contribu- 
tion to the variance is to dilute the differences created by the between-units 
contributions. With the subsampling methods used in practice, the within-units 
contribution is often approximately the same for different methods. The net effect 
is that the overall relative precisions of different methods move tow 
In practice the within-units component is often at least as large as 
units component. 

Even with large computers the self-weighting forms of the estimates are a 
convenience and are widely used. Use of a self-weighting plan should incur only 
minor loss of precision, since the choice of the m; affects only the within-units 
contribution to the variance. The self-weighting form requires m; © M;/z;, while 
the m; that minimizes V, for a given expected total number of subunits is 
m; © M;Sz;/ zi. These two allocations differ little unless the S,, vary over a wide 
Tange. 

In surveys with many items we also noted (section 9A.13) that the z; used for 
sample selection may not be a good choice for some items, Y; having little relation 
one sone) variate x; can be found such that yi/ x; is reasonably constant, 

yY 18 to switch to a ratio estimate (section 11.13). If no such x; is 


ard equality. 
the between- 
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available, the alternative estimate suggested by Rao (1966) is of the form 


A NN 
Yoo = 7 Mii (11.47) 


Preliminary evidence by Rao suggests that if Y; and z; are unrelated, yee may 
perform about as well as Y,, would if sampling were with equal probabilities (as 
illustrated in section 9A.13). If primary units are chosen with replacement, an 
unbiased sample estimate of V( Y=) is 


a N2: “ts ee) Ud 
(Yee) = Tq e Ma ba (11.48) 
where Y*,= Y%,./N. This expression does not include the contribution of the 
bias in this estimate to its MSE, which it is hoped will be small if Y; and z, are 
unrelated. 


11.12 RATIOS TO ANOTHER VARIABLE 


In two-stage sampling the quantity to be estimated is often a ratio Y/X. This 
happens for two different reasons. As mentioned previously, if x is the value of y 
at a recent census, the ratio y/x may be relatively stable. An estimate of the 
population total or mean of y that is based on this ratio may be more precise than 
the estimates considered thus far in this chapter. 

Ratio estimates of this type are encountered also in the estimation of propor- 
tions or means over parts of the population. In an urban survey with the city block 
as primary unit, an example of a proportion of this type is 


number of employed males over 16 years 
total number of males over 16 years 


If y = 1 for any employed male over 16 and yi = 0 otherwise, and x, = 1 for any 
male over 14 and x; =0 otherwise, the population proportion is Y/X. Other 
examples for this type of survey are the average income of families that subscribe 
toa certain magazine or the average amount of pocket money per teen-aged child. 

With any of the preceding methods of selection of the units (equal or unequal 
probabilities, with or without replacement), the standard approximate formula for 
the MSE or variance of Yz and R is easily obtained from the formula for V(Y) for 
the same method of sample selection by the technique which we have used 
repeatedly. A A n 

To obtain V(Yp), replace yy by dy = yy — Rxy in V(Y) for the sampling method 
used. For V(R), also divide by X°. 

For the estimated MSE or variance v(Y,), replace yy by dy! = yj — Rx; in v(Y). 
For v,(R), divide also by £’. 
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This follows by the method used in Theorem 2.5. For 


Yr Y=x% y-ž0 RX) =(¥-RX)=b (11.49) 


where dj = yj — Rx. Since E(D) = 0 with any sampling plan for which Ŷ and X 
are Unbiased, we find E(Yz— Ya by taking the formula for V(Ŷ) for this plan and 
substituting dj = yj — Rx in place of Yij- 


For example, consider Y,, = (N/n) È My: =(N/n)¥. ¥; in sampling with equal 
probabilities. From formula (11.22) in section 11.7, 


phat Y- Ý? NY M-AS 
V= = fy 2 NE MEO feds 


(11.50) 
m; 
Hence, with equal-probability sampling and Y, = X Y,/X., 
N yh (WRX)? | NY M20 fa) S3y 
Ye) =—(1—7, oe a LN i)? di 1.51 
V(Yr) m afa) Nea Aas = (11.51) 


where 


d oe = as 
Mat 2 {Ou Ra)- (7, -RA)P 


Oe = 
Sa2i= 


The conditions under which Yr reduces toa multiple of the ratio of the sample 
‘totals ZLy,/TIx, are always those under which the corresponding Y becomes 
‘self-weighting; in this case fai = m;/M, = constant. 


For the estimated variance, substitution of d;'= yj —Rx; for yy in (11.24) for 
v(Y,) gives f 


21 = Y,—RX,)? n 2 - 
o( fq) U fi) L(¥i-RX;) +p Mia 


2 
2i)S4'21 
n-1 : 


m; 
Similarly, with ppz selection with replacement we had for 


(11.52) 


A 12 : 
Yop: = A È Myi/z; 
(from (11.33) in section 11.9) 


A, 1N j 2 N M? 2 
W Fins) =D 22 v) 41 MUAS 


(11.53) 
Zim; 


Hence, for ÝR = X¥/X in sampling with replacement, 


^ N bahi 2 
Va = Eg (WRX) SME fa SE (11.54) 
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From (11.35) a sample estimate v(Y) that is slightly biased is 
1 n (% aka: 
n(n—1) z Zi 


With Brewer’s method of sampling without replacement, Van = X Yol Ro, 


where Ys =} Miy;/7;. For n =2, formulas (11.42) and (11.44) give, for the ratio 
estimate, 


(YR) = (11.55) 


š NN N2 N MÈ- fa)Sha 
V(¥re) => X (om = m4)( =! =) tE A anes) 
i j>i 


Ti Tj i=l MTT; 
where D; = Y,- RX; and units i and j are the two selected. 


v e: Di paz 2 M?(1-— )s3. à 
AS aae + 5 MOK fase 
r 1 e iii 


(11.57) 


11.13 CHOICE OF SAMPLING AND SUBSAMPLING 
FRACTIONS. EQUAL PROBABILITIES 


This problem is discussed first for the ratio-to-size estimate when units are chosen 
with equal probabilities. The subsampling fraction f= m;/M; is assumed con- 
stant, so that the estimate is thè sample mean per subunit. 

The simplest cost function contains three terms. 


Cu = fixed cost per primary unit 
C2 = cost per subunit 
c, = cost of listing per subunit in a selected unit 


The third term is included because the sampler must usually list the elements in ` 
any selected unit and verify their number in order to draw a subsample. Hence 


n n 

cost = cun +c.) m, +c, M; 
This formula is not usable as it stands 
of units that is chosen. Instead, con 
equals 


» Since the cost depends on the particular set 
sider the average cost over n units, which 


E(C)= cun + conm + cM = (cu + cM)n + conm = cin + conti (11.58) 


where c, now includes the average cost of listing a unit. 


We determine n and m=f,M so as to minimize V(¥) for given cost or vice 
versa. From (11.28) in section 11.8, dividing by Mo? = (MN)?, 


N 
a l-fi L M- Ý 1-f,8 M, 
MSE(¥) = m MN-1) + nh EMm 


314 "SAMPLING TECHNIQUES 
Write P 
2_% MY,- YP 
M*(N~-1) 
This is a weighted variance among unit means per element. It is analogous to the 


variance S,* in section 10.3 and reduces to S,? if all M; are equal. We may also 
write 


Ss 


N M; 
S? LM 


This is a weighted mean of the within-unit variances. It reduces to the S3? of 
section 10.3 if all M, are equal. _ . 
In this notation, since fz = m/M, 
5 ( 2 “5 i hap ee 
E(¥) =% -= +— $,?-—§, 11.59 
ASE Grr ler ( ) 


Applying the Cauchy-Schwarz inequality as usual to (11.58) and (11.59) we get 


Aro S2 ci 
op = -EV + 11.60) 
2 Sy — S27/M C2 : 
The methods given in section 10.8 for utilizin 
and c,/c, to guide the selection of 7 
when units are drawn with equal p 
The next section presents a mor 


g knowledge about the ratios S3/Sp 
op are applicable here. The unbiased estimate 
robabilities can be handled similarly. 

general analysis of this problem. 


11.14 OPTIMUM SELECTION PROBABILITIES AND 
SAMPLING AND SUBSAMPLING RATES 


An important early analysis by Hansen and Hurwitz (1949) determines at the 
same time the optimum z; as functions of the M, and the optimum sampling and 
subsampling fractions. Selection of units is wit 


h replacement. The analysis is 
presented for Yp in the self-weighting form, so that 


mi = foM;/nz; = foM;/ 7. 
As in Section 11.13, the cost function is 


n n 
C=c,n tcea} mi +c, M, 
Although the M, are known, a listing cost is included for those units that are in the 
sample, since listing may be needed to provide a frame for subsampling. 
The average cost of sampling n units is 


E(C)= cun + CafoMa + c; 5 mM, (11.61) 
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By (11.54) in section 11.12, 


vaii r rgy Mm) 


Sha (11.62) 


Since dy =yj—Rx,, we may write (Y;—RX;)=M,D,. Noting that m; =nz, 
M,/nzim; = 1/fo, and combining the first and third terms in (11.62), we get 


V=V(¥p) 3[“(o7-Ser) Ms, (11.63) 


The problem is to choose n, fo, and the 7; = nz; to minimize V subject to fixed 
average cost and to the restriction 


N N 
YVA=l-1Lm=n 


Take A and yw as Lagrangian multipliers and minimize 


V+ alen +c2foMo+c ï miM; -£(0)| +uln ne m) (11.64) 


Differentiation gives 


n: ÀcCutu=0; H= ÀC, (11.65) 
-M?S i 
Ti: (De-Sa) saom -u =0 (11.66) 
Relations (11.65) and (11.66) lead to 
Ti FR 2 Sia" 1/2 
=—oC it —— 
m m(D, M,) | «tM (11.67) 


Since the individual values of (D?— $2,,/M,) will not be known, we consider 
how the average value may depend on the size of unit M; using the following 


rough argument. Suppose a population were divided into units of size M. Since 
E(D;,)=0, formula (9.10) on p. 241 gives 


a oo 2 
EDA = VB) = SE + M- Don] 


where S,’ is the population variance of the d; and py is the intraunit correlation 
for units of size M. Also, from (9.15), p. 242, 


E(S?42i)=Sa(1— pm) 
hence 


=. Sari\ Sa 
E(D- F) UM- Dou-l-ou)]= pns 
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From (11.67) this gives as an approximation 


Mem (11.68) 
Vc, +¢M, 


zx 


With p positive, Pm, May be expected to decrease as M; increases, since subunits 
far apart are less subject to common influences, but this decrease may be only 
slight for Vpy;,. Deductions from (11.68) are as follows. 


1. If the cost of listing cM; is unimportant, z; œ M, (i.e., pps selection) is best if 
Vpm, changes little over the range of sizes in the population. 1 om, decreases 
noticeably, optimum probabilities lies between z; © M; and z; œ Mw 

2. If listing cost predominates, z; should lie between VM, and constant (equal 
probabilities). f 


3. If listing costs and fixed costs are of the same order of magnitude, z; © VM; 
may be a good compromise. 


Differentiation of (11.64) with respect to the overall sampling fraction fo gives 


N 
2X MSåzi 
EAM (11.69) 


The value of À is found in terms of the know 


n 7’s by adding (11.66) over all units. 
This step leads to the result 


N 1/2 
_| (cu +taM) yY, (Mi/Mo) Si: 
f= |] oe (11.70) 
az ie 5pSix) 
2 mai M, 

Comparison with (11.60) will show that fy has the same structure as in sampling 
with equal probabilities, remembering that in (11.60), fo = nñop/Mo and cy = 
Cu +¢M. 

The optimum n is found from the average cost equation (11.61), 


11.15 STRATIFIED SAMPLING. UNBIASED ESTIMATORS 


For the unbiased methods (Ŷ,, etl Ys: etc.) the extension to stratified sam- 
pling is straightforward. The subscript h denotes the stratum. 
The estimated population total is Ya =} Ŷ, with 


Vk) = WPi) of) =F o(%) (11.71) 
h 


These variances are obtainable from the formulas already given. 
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We may note the conditions under which the estimates become self-weighting. 
For Y,,. (section 11.9) or ¥g (section 11.10), 


ppz 
5, A! ao 
¥,=F—F hiYh 


a 11:72 
h Nh i MhiZhi € ) 


where yp is the total over the mp; subunits from the ith unit in stratum hk. These 
estimates are seen to be self-weighting within strata if the probability fop = 
NhZhiMni/ Mpi Of selecting any subunit in the stratum is constant within the stratum, 
In this event, the estimate becomes 


x EY % 
Ys =D 5-2 Ya (11.73) 
R fon i 


which is completely self-weighting if fo, is the same in all Strata, as might be 
anticipated intuitively. j 

If units are of the same size within a given stratum (i.e., M,,; = Mp), it was shown 
(section 10.10) that sample allocation leading to a_ completely self-weighting 
estimate is close to optimum, provided that S3,/Vc., is reasonably constant. 
When the M,, vary within strata, a corresponding result for estimators like You 
and Y; is as follows. Suppose that the cost function is linear as in sections 11.13 
and 11.14, and that the estimator is self-weighting within strata GE, fon = 


M4ZpiMy;/M,;). Then the optimum fo, for given expected cost may be shown to be 
of the form 


1 
os e 2 (Mni/ Mon) S3ni (11.74) 


where Mon = =M,,. Thus, choice of fon =fo, Which makes these estimators com- 
pletely self-weighting, will be near-optimal as regards precision unless either the 
S3,0r the cap vary widely from stratum to stratum. especially since this choice 
affects only the second-stage contribution to the variance. 


11.16 STRATIFIED SAMPLING. RATIO ESTIMATES 


The formulas for the separate estima 
for a single stratum, assuming a lar 
sampling in different strata. 3) 

For the combined estimate, Ya. = XY,,/X,, 


te Viz follow from those in section 11.12 
ge sample in each stratum and independent ó 


A, L A L A 
Yre-Y=X¥ Y;,/Y X,-Y 
-FAY -RX) =Ê, —RX,) 


Hence, we substitute drij = Yhij — RXny for Yri; to obtain the approximate formulas 


318 SAMPLING TECHNIQUES 


for V(¥Ype) from those for V(Ŷ,) by the sampling plan used. For v( Ye) we 
substitute dhj = Yrij — ReXniy in v( Ys). A 

For example, with unequal probabilities with replacement (section 11.9), 
formula (11.33) leads to 


a LUN Dri 2 Mn —foni)Sa2ni 
VEe EE E [a(o + MeO Si] (11.75) 


where 


1 Mni 5 z 
ee Le Dia Ne is 2 
Sa2hi (M,—) 2 [Ovni — Rxny) — (Yani — RXni)] 
For the estimated variance, formula (11.35) gives, after the substitution, 


¥ (Dy’-D,'? 


AE aya pa a 
(Yre) AEST (11.76) 
where 
p, = Mri dn o u a rate A 
” ZE pa n dni = Yni =R Shi 


11.17 NONLINEAR ESTIMATORS IN COMPLEX SURVEYS 


In addition to the estimation of totals, means, ratios and differences among 
them, analyses of survey data may involve estimates of more complex mathemati- 
cal structure (e.g. simple and partial correlation coefficients, medians, and other 
percentiles). The objectives of the analyses may include attaching a confide 
interval to the quantity estimated or performing a test of significance 

Given a random sample from an infinite population, statistical theory has 
produced a variety of methods for meeting these objectives—exact small-sample 
methods based on assumptions about the nature of the distributions followed by 
the observations, and approximate large-sample methods requiring fewer 
assumptions. With the more complex types of sample studied in this book, on the 
other hand, we have been able to give methods for computing unbiased estimators 
of the variances of unbiased linear estimators like Ŷ. Y. x PAA ing the 
sample large enough so that these esti aa EE E 

p! gi estimators have distributions close to normality, 

the normal tables supply confidence limits and tests of significance. There remain 
many problems with nonlinear estimators in complex ET ; 
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Three approximate methods for estimating the standard errors of nonlinear 
estimators have been produced. They will be described for stratified random 
samples with n, = 2 in all strata, the case for which they have received the most 
study. All methods give the usual unbiased variance estimates when the estimator 
is linear. The sample can be a multistage sample, primary units being drawn either 
with equal probabilities or with unequal probabilities with replacement. 


11.18 TAYLOR SERIES EXPANSION 


This is the method that produced the approximate formulas for the estimated 
variances of R, and R., R, in stratified sampling. The function to be estimated, 
IY Yo... Ya) =f(Y) is expressed as a function of the population totals of 
certain variables, the estimate f(Y) being the corresponding function of unbiased 
sample estimates of ¥. For variable j with n, =2, when units are chosen with ppz 

. with replacement, the population total and its linear estimator are 


LN, ee 2 Pru Cee ; 
YaLE Ypi; Y= =È (Yini + jn) (11.77) 
h i h i Zhi h 


where yini = Pini 2 zi and Ñi is an unbiased sample estimate of Yjni from the 
subsample in this unit. 


From (11.35) an unbiased estimate of Vv( Ê) works out as 
ny 2 
PYNE E 20 Fn) =D (Vina Yina)™ (11.78) 


Hence, by Woodruff’s (1971) extension in section (6.13) of the Keyfitz short-cut 
method for ny = 2, a Taylor series approximation to the variance of fY) is 


R k 1 af 3 
rE E (Z) on-a] (11.79) 
h Ly \OY; 

Expression of a nonlinear function of the measured variables in the form f(¥) 
may require some work and care and is not possible with some functions of 
interest. Consider a simple example—simple random single-stage sampling with 
primary units of equal sizes, where the correlation between the unit totals of two 
variables U, and U; is to be estimated. The population value f(Y¥) is 


(ZE Vin) (EE Uau) 


2 z i Van) pie 
pt N | 
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In terms of our variables Yj, this is a function of five such variables. 
Yoni = Un dph Yoni = Úir; Yay = Uni 
Yani = Uini; Ysri = Uri 
with yini = OrniUni/22nn i= 1,2, Zn = 1/N, in this case. 


11.19 BALANCED REPEATED REPLICATIONS 


We consider this method first for a linear function of a single variable. Let 
f()=Y, with its estimate f(Y)=¥ = F(y, 1 + yxo"), where Yni = Yhi/ 22h An 
unbiased variance estimate v( Ý) is, from (11.78), 


2(Y)=¥ Ona! =y) (11.80) 
Select a half-sample H by choosing one unit from each stratum. The estimate of 


f(¥)= Y from this half-sample is f(H) =2 È Yri’, where hi is the unit chosen in 
stratum h. Hence, if f(S) denotes the estimate of F(Y) made from the whole 
sample, ` 

FED" HS)= 25 yu! ¥ (n+ yna) =E E Yn’ Yaa! (11.81) 


the signs depending on whether unit 
Comparison of (11.81) and (11.8) 
selections, [ f(H)—f(S)}* contains the 


1 or unit 2 was selected in stratum A. 

0) shows that for any of the 24 possible 
s correct squared terms (y,,,'— yp2')? in o(Ŷ) 
in every stratum. However, for any specific half-sample every pair of strata 
contributes a cross-product term with a + sign to [ f(H) — f(S)]°. McCarthy (1966) 
noted that a'set of balanced half-samples can be found in which, for every pair of 
Strata, half the cross-product Signs in the set are + and half —. If this set contains g 


different half-samples, the unwanted cross-product terms cancel when we sum 
over the set giving 


1 
BEDAS =E (n-o) (11.82 


If G is the complementary half-sample to H;, second and third forms are 


1 g 
A= O- iym- (11.83) 


A fourth estimator is the average of the first two above. 

Balanced sets of this type were given earlier 
use in experimental design for g an 
balanced set has g equal to the smallest multiple of 4 that is >L. With L = 5 strata, 
g =8. Table 11.7 shows a balanced set for L= ; itl, i 
unit 2 in a stratum. ; 
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TABLE 11.7 
BALANCED HALF-SAMPLES FOR L=5 STRATA 
HALF-SAMPLE 
Stratum PRES AA GIAN 
1 Hartt ins ttt) Jie) — 
2 Fat nes ee = 90) oe 
3 eine ete! lra Se oe 
4 ete Mist! Groh fs i. 
5 TA Geaa Galette ae 


When this balanced repeated replication (BRR) method is applied to a non- 
linear function f(Y) of several variables, the four estimators 


> (fH) -4(S)F ORE aa [fH -ACP 


and the average of the first two, all differ to some extent. Also, f(S) = f(¥)-is 
biased, and the variance estimates do not include the correct bias contribution to 
MSE[f(¥)]. However, note that if we drew m independent samples S; each with a 
complementary Hj, C, the quantity 


1 $ 2 
2m = (fH) -#(G)] (11.84) 


would be an unbiased estimate of VLf(A)]. Thus, roughly speaking, the BRR 
method amounts to assuming, for f(S), that (a) its bias is negligible, (b) 
VLF(S)] = (1/2) VEF(A)], and (c) calculation of v[f(S)] from BRR for a single 
Stratified S agrees well enough with its calculation from independent samples S; to 
be useful. 

The repeated replications method has been extensively used both for sample 
selection and variance estimation by the U.S. Census Bureau in the 1950s and by 
Deming (1956, 1960). The idea goes back to the Mahalanobis technique of 
interpenetrating subsamples (Mahalanobis, 1944). McCarthy (1969) reviews the 
ea ia of the BRR method, with further discussion by Kish and Frankel 
(1974). 


11.20 THE JACKNIFE METHOD 


For R in simple random samples this method was described in section 6,17. It 
was proposed as a means of estimating V(Ro), but could also be tried for 
estimating V(R). Omit unit j from the sample and calculate Rj; the ratio estimate 
from the rest of the sample. Ignoring the fpc, an approximate estimator of V(R) 
can be obtained from (6.86) with g = n. If we assume R=R_, the mean of the R, 
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this estimator becomes 
Amel) BF a $ -1 š 
sf) (RRP i579] (11.85) 
i 
the expression on the extreme right indicating how the method might be applied to 
other nonlinear estimation. 
For a linear estimator like ¥ it is easily verified that formula (11.85) reduces to 
the usual v(¥) => (y; —7)/n(n—1). i 
In extending this method to stratified samples with n, =2, Frankel (1971) 
suggests omitting one unit at random from stratum h for h = 1,2,...Z in turn, 
_calculating f(S;,) from the remaining sample of size (2L — 1). One form of the 
Jacknife estimate of V[f(S)] is then 


OESS -ASN (11.86) 


For a linear estimator f (Ẹ), this variance estimator reduces to the usual unbiased 
estimator in ppz sampling with replacement. 


As with BRR, there are four analogous versions of the Jacknife estimator of 


of f(S)]. 


11.21 COMPARISON OF THE THREE APPROACHES 


The Taylor, BRR, and J methods have had very limited rigorous theoretical 
justification. Appraisal of their performance with different estimators and types of 
Surveys has relied thus far primarily on Monte Carlo studies. One study is 
described by Frankel (1971) and Kish and Frankel (1974). 

The sample was a single-stage sample of cluster units of slightly differing sizes, a 
unit having on the average 14.1 households. The population of N = 3240 units or 
45,737 households was divided in turn into 6, 12, and 30 equal-sized strata, with 
n, = 2 in each stratum. Simple random sampling within strata was used. The data 
were taken from the Current Population Survey (U.S. Census Bureau). The eight 
original variables for a household were the number of persons, number under 18, 
number in labor force, income, age, sex, and years of schooling of household head, 
and total income. Various means (ratios), differences of means, and simple, partial 
and multiple regression or correlation coefficients (three independent variables) 
were computed. In numbers of primary units the samples were quite small, 12, 24, 
and 60, being about 170, 340, and 850 in numbers of households. 

In computing variance estimators, all four forms of the BRR and the J methods 
were compared. The Taylor method was not used for partial and multiple 
regressions, because of difficulty in expressing the partial derivatives af/aY; in 
manageable form. Since the sampling method was proportional stratification 
without replacement, the fpc (1 —f) was included in all variance estimates, with 
f=n/N, although scarcely needed with these small samples. 

Except for the multiple correlation coefficients, the estimators f(¥) had rela- 
tively unimportant biases for all three sample sizes, the ratios |Bias|/s.e. being 


/ 
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<0.1 for ratios, differences of ratios, and simple regression coefficients, and 
around 0.2 for simple and partial correlation coefficients. When the approximate 
v(¥) are regarded as estimators of MSE(¥Y), all three methods—Taylor, BRR, and 
J—performed well for ratios and differences of ratios, the average |Bias in 
v(¥)|/MSE(®) being under 5%. For simple correlation coefficients, the BRR 
variance-estimators had substantially smaller biases than the T or J estimators. 
The reverse was true for simple regression coefficients. With both BRR and J the 
best of the four v( Ŷ) methods was the average of the H and C estimates (i.e. with 
BRR) 


TED- o-o) o asn 


In confidence interval statements and tests of significance, the important issue is 
how well the tail probabilities of the variate [f (Y)—f (¥)]/s.e.[f (¥)] agree with 
those of Student’s ¢ with L d.f. For selected t-values Frankel (1971) gives the 
Monte Carlo estimates of the tail frequencies for the related variate Lf) — 
Ef (¥)]/s.e.Lf (¥)]. Some results for 6 and 30 strata are shown in Table 11.8 for the 
t-values that Frankel presents nearest to the 5 and 10% two-tailed levels. Version 
(11.87) of the BRR and J methods is the one shown. 


TABLE 11.8 


AVERAGE TAIL PROBABILITIES OF [f (Ẹ) -Ef(Y )/s.e.Lf (Y)] 
COMPARED WITH THOSE OF STUDENTS’ t 


6strata 
P(t) = .042, t= 2.576 P(t)= 098, t= 1.960 
BRR J Taylor BRR. -J Taylor 
Ratios 044 = .049 052 096 106 112 
Bis 034 _ 048 .058 JOBS) ory. 117, 127 
rs -052. 069 .084 114. 137 163 
Partial r’s .043 063 = 092 132 — 
Multiple R’s 065. 088 = 105. 160 — 
SSA Ss EE Rolie CERRO PO NIT ea at 
30 strata 
P(t) =.059, t= 1.960 P(t)=.110, t= 1.645 


BRR J ` Taylor , BRR J... Taylor 
a SS eh 


Ratios 056.057 057, 09st Sd 
bs 062, 067.068 101616 
rs 089 | 098 = 102", 138 . 153 tye 

Partial r’s 103.121 = 156.181 = 

Multiple R’s 175.207 = 265.297 = 


s b = simple regression coefficient. 
r = simple correlation coefficient. 


324 SAMPLING TECHNIQUES 


To summarize, BRR consistently performs best in Table 11.8. Except for 
multiple R’s, it can be regarded as adequate for practical use if one has the Me 
that in data analysis a tabular 5% tail value represents an actual tail value 
somewhere between 3 and 8%. J does slightly better than Taylor. Except for BRR 
with ratios and simple regressions, all methods give actual tail frequencies higher 
than the t-tables, so that confidence probabilities are overstated. A puzzling 
feature is that for correlation coefficients the increase in sample size from 12 to 60 
has not brought a corresponding improvement in the closeness of the actual to the 
t tail frequencies. ohg 

This study opens up a wide area for investigation of the methods with different 
survey plans and different types of estimator f(¥). 

In a Monte Carlo study of a larger, more complex sample (two-stage Pps 
sampling with replacement, including both stratification and poststratification) 
Bean (1975) compared the Taylor and BRR methods for estimators of the ratio 
type. Both methods gave satisfactory variance estimates and adequate two-sided 
confidence probabilities calculated from the normal distribution. Sufficient skew- 


ness remained, however, so that one-sided confidence intervals could not be 
trusted. 


EXERCISES 


11.1 By working out the estimates for all possible samples that can be drawn from the 
artificial population in Table 11.1, by methods Ia, Ib, Il, and III, verify the total MSE’s 
given in Table 11.2. 


11.2 For methods II (equal probabilities, unbiased estimate) and III( pps sefeetion), 
recompute the variances of Ý for the exampl 
precision of method III in relation t 
general result does this illustrate? 


11.3 For the population in Table 11. 1, if the estimated sizes z; are 0.1, 0.3 and 0.6, with 


m, =2, show that the unbiased estimate (method IV) gives a smaller variance than pps 
sampling. What is the explanation of this result? 


11.4 The elements in a 
~ classes. The unit sizes M, 
as follows. 


M,=100, M,=200, M;=300, © P,=0.40)) P,=0.45, P,=0.35 
For a sample consisting of 50 elements from one primary unit, compare the MSE’s of 
methods Ia, II, and III for estimating the proporti 


| i K on of elements in the first class in the 
population. (In the variance formulas in section 11.2, 5? 


11.5. A sample of n primary units is selected with e 
unit, a constant fraction f, of the subunits is taken; If 
fall in class C, show that the ratio-to-size estim 
proportion in class C is p = Za,/Xm,. From formi 


population with three primary units are classified into two 
and the proportions P, of elements that belong to the first classare 


is approximately PQ.) 

qual probabilities, From each chosen 
a Out of the m; subunits in the ith unit 
ate (section 11.8) of the population 


ula (11.36), show that an estimate’ of 
MSE(p) is 3 ~ 
1-fı 2 M?(p -p fi —-f,) n Mm, 
Dp) =—S tH MA ai 
v(p) nM? n-1 n mnM 1PM 


where p, = a,/m,. 


e in Table 11.1 when m; = 1. Show th the : 
o method II is lower for m; = 1 than for m, = 2. What 
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11.6 A firm with 36 factories decides to check the condition of some equipment of 
which M,= 25,012 pieces are in use. A random sample of 12 factories is taken, a 10% 
subsample being checked in each selected factory. The numbers of pieces checked (m,) and 
the numbers found with signs of deterioration (a;) are as follows. 


a; a: 

Factory om; a pi= A Factory om; a; p= = 
1 65 8 0.123 7 85 18 0.212 
2 82 21 0.256 8 73 11 0.151 
3 52 4 0.077 9 50 7 0.140 _ 
4 91 12 0.132 10 76 9 0.118 
5 62 1 0.016 11 64 20 0.312 

- 6 


69 3 0.043 12 50 2 0.040 


Estimate the percentage and the total number of defective pieces in use and give 
estimates of their standard errors, ` 


Note. Since M,/M = m,/m, the between-units component of v (p) may be computed as 
ISA 


a ae a?—2p Yam,+p° > mè) 
nm?(n— 1) 


and, since the m; are fairly large, the within-units component as 


f (1 —f; 2) 
(ami)? Zaq 
11.7- If primary units are selected with equal probabilities and f, is constant, show that 
in the notation of exercise 11.5 the unbiased estimate of a population proportion is 
p = Na,/nMof, and that, if terms in 1/m, are negligible, its variance may be computed as 


1-f, ts aha 1(1—f2) 2 
oT a +S aa 


Calculate p and its standard error for the data in exercise 11.6. 

„11.8 Asample of n Primary units is chosen with probabilities proportional to estimated 
sizes z; (with replacement) and with a constant expected over-all sampling fraction fọ. Show 
that the unbiased and the ratio-to-size estimates of the population total are, respectively, 


T/ fo and ™,/X m, where T is the sample total. (It follows that if Mbo is not known the 
unbiased estimate can be used, but not the ratio to size. For estimating the population mean 
per subunit, the situation is reversed.) : ' 

11.9 In a study of overcrowding in a large city one stratum contained 100 blocks of 
which 10 were chosen with probabilities proportional to estimated size (with replacement). 
An expected over-all sampling fraction f, =2% was used. Estimate the total number of 
persons and the average persons per room and their s.e.’s from the data below, 


Block 1 A228 gt Il Ae ISOs crt hey Sah RG IEE 
Rooms 60 52 58 56 62 51 72 48 71 58 
Persons 115 80 82 93 105 109 130 93 109 95 


11.10 For Durbin’s method (section 11.10) of sim 


s plifying variance estimation in ppz 
sampling without replacement, a simple method of sample selection, due essentially to Kish 
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(1965), is as follows. The subscript h to denote the stratum will be omitted and the number 
of primary units is assumed to be even. k 

Arrange the units in order of increasing z; and mark them off in pairs. The method is 
exact only if z, = z; for members af the same pair; this will be assumed here. Select two units 
ppz with replacement. If two different units are drawn, accept both. If the same unit is drawn 
twice, let the sample consist of the two members of the pair to which this unit belongs. Show 
that for this method: (a) m; = 2z, (b) for units not in the same pair, 7, = 22z,z, = mm2, SO 
that mmy '—1=1, and (c) for units in the same pair, Ty = 42,2; = mm, so that 
mmm —1=0. 

11.11 In section 11.9, formula (11.33) for V( ys) in sampling with replacement was 
proved under the plan that whenever the ith unit was selected, an independent simple 


random subsample of size m, was drawn from the whole of the unit. Prove the following 
results for two alternative plans. 


(a) When the ith unit is selected t, times, a sim 
drawn from it (assume m,t,=M,). Under this pla; 


N 
(n—1)} M,S,?/n (Sukhatme, 1954). 
(b) When the ith unit is selected 1, times, a simple random subsample of size m, is 
drawn. Then V( Lops) in (11.41) is increased by X 


ple random subsample of size mt, is 
n, V(¥pp2) in (11.41) is reduced by 


-1)N 
a) ZMP ~fai)S2?/m, 


N 
In both (a) and (b), Ý, =F ,M,y,/nz, the ith unit receiving weight t, 


CHAPTER TI2 


Double Sampling 


12.1 DESCRIPTION OF THE TECHNIQUE 


As we have seen, a number of sampling techniques depend on the possession of 
advance information about an auxiliary variate x;. Ratio and regression estimates 
requie a knowledge of the population mean X. If it is desired to stratify the 
population according to the values of the x;, their frequency distribution must be 
known. 

When such information is lacking, it is sometimes relatively cheap to take a 
large preliminary sample in which x; alone is measured. The purpose of this 
sample is to furnish a good estimate of X or of the frequency distribution of x;. Ina 
survey whose function is to make estimates for some other variate y,, it may pay to 
devote part of the resources to this preliminary sample, although this means that 
the size of the sample in the main survey on y; must be decreased. This technique is 
known as double sampling or two-phase sampling. As the discussion implies, the 
technique is profitable only if the gain in precision from ratio or regression 
estimates or stratification more than offsets the loss in precision due to the 
reduction in the size of the main sample. 

Double sampling may be appropriate when the information about x; is on file 
cards that have not been tabulated. For instance, in surveys of the German civilian 
population in 1945, the sample from any town was usually drawn from rationing 
registration lists. In addition to geographic stratification within the town, for 
which data were usually already available, stratification by sex and age was 
proposed. Since the sample had to be drawn in a hurry and since the lists were in 
constant use, tabulation of the complete age and sex distribution was not feasible. 
A moderately large systematic sample could, however, be drawn quickly. Each 
person drawn was classified into the appropriate age-sex class. From these data 
the much smaller list of persons to be interviewed was selected. i 


12.2 DOUBLE SAMPLING FOR STRATIFICATION 


The theory was first given by Neyman (1938). The population is to be stratified 
into L classes (strata). The first sample is a simple random sample of size n’, 
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Let 
W, = N/N = proportion of population falling in stratum h 


Wwa =n},/n' = proportion of first sample falling in stratum h 


‘Then wp is an unbiased estimate of W,. 

The second sample is a stratified random sample of size n in which the y,; are 
measured: n, units are drawn from stratum h. Usually the second sample in 
stratum h is a random subsample from the n,,’ in the stratum. The objective of the 
first sample is to estimate the strata weights; that of the second sample is to 
estimate the strata means Y. $ 

The population mean Y=} W, Y,. As an estimate we use 


a 
J= L Win (12.1) 


The problem is to choose n’ and the n, to minimize V( Fx) for given cost. 

We must then verify whether the minimum variance is smaller than can be 
attained by a single simple random sample in which y; alone is measured. In 
presenting the theory, we assume that the n, are a random subsample of the n,,’. 
Thus, n, =v,m,', where 0< vp <1 and the v, are chosen in advance. Repeated 
sampling implies a fresh drawing of both the first and the second samples, so that 
the wa, np, and J, are all random variables. The problem is therefore one of 
stratification in which the strata sizes are not known exactly (section 5A.2). 

Two approximations will be made for simplicity. The first sample size n’ is 
assumed large enough so that every w, >0. Second, when we come to discuss 
optimum strategy, every optimum v, as found by the formula is assumed <1. 


Theorem 12.1. The estimate Ys: is unbiased. 


Proof. Average first over samples in which the w, are fixed. Since J, is the 
mean of a simple random sample from the stratum, E(j,)= Y),. Furthermore, 


when we average over different selections of the first sample, E(w,) = W,, since 
the first sample is itself a simple random sample. Hence 


E (Ju) = EEIE wiFnlws) |= EIE w,¥,) =£ W, F, = F (12.2) 
Theorem 12.2. If the first sample is random and of size n' 


2 , the second sample 
is a random subsample of the first, of size n, = vany’, 


where 0< v, = 1, and the vp 


are fixed, 
NA Ew Seya 
ra) Sp) ELi) 
Wi) =S TN) teas at (12.3) 


where $° is the population variance. 


Proof. The proof is easily obtained by the following device. Suppose that the 
Yni Were measured on all n,’ first-sample units in stratum A, not just on the random 


——————— 
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subsample of n}. Then, since w, = n,'/n', 

5 za =r 

z Wayn =Y 
is the mean of a simple random sample of size n’ from the population. Hence, 
averaging over repeated selections of the sample of size n’, 


PERETI i 
a hYh RON. 2. 
But 
L L L 
Yst =} WhYh =} Win’ +È Wa(¥a— Tn’) (12.5) 


Let the subscript 2 refer to an average over all random subsamples of'n, units that 


can be drawn from a given n,’ units. Clearly, Ex(¥,)=¥),'. Results that follow 
immediately are (see exercise 2.16): 


cov [¥n', (Fa —Fn')] = 0: 
COV (Fh', Fux = Wn): Vn- Fn’) = V(Fn) -— Vn’) 


Hence, for fixed wp, 


vE warn —Hi= ]=¥ mese(+—) =y Se (1 1) a25 


ny! n 


(12.6) 


SINCE Ny, = VyNy' = V_Wyn'. 
Averaging over the distribution of the w, obtained by repeated selections of the 
first sample, we have, from (12.4), (12.5) and (12.7), 


5.)= (2-1) on 
vga) = 5-5) +3 (i (128) 


Corollary 1. The result can be expressed in a number of different forms. By 
the analysis of variance, 


(N~1)S?=¥ (N, —-1)S? +E N,( Ë,- ¥)? (12.9) 
Hence, if g'=(N-n')/(N— 1), multiplying by g'/n'N gives 
(N=n')S? _ (2 iy 8 g' Ega 

MaN SON) MN S E W P 

(12.10) 
From (12.3) this gives 

L WrSh 
Vga) =} 


h 


Vh 


1 IL "E Pa =, 
(4. 1) HEL (W,-N ISE, x W,(Y¥,—- ¥)? 
h 


(12.11) 
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Furthermore, by the definition of g’= (N—n')/(N—1), it follows that 


Aa ee E (12.12) 


Hence, in (12.11), the second and third terms in) W,,S;,”, which have coefficients 
—1/n' and g'/n', may be written alternatively, giving 


RAIA! ý i aes 
Gu) = msl 2-5) +h (Ms DS+ ED WA FP 
(12.13) 
If classification is cheap, it may not be reasonable to assume n'/N negligible, since 
a sizable proportion of the population may be classified. For most applications, 
however, the term in g//n'N in (12.13) can be neglected. In this event (12.13) 
simplifies to 


wt 2 1 1 'L =i g 

VF) =È mse -4) +£¥ w,(¥,- 9? (12.14) 
h N Vh N: N h 

The results in theorem (12.2) were given by Rao (1973). 


Corollary 2. If a proportion is being estimated in the second sample, the 
expressions: S? = NPQ/(N—1) and 


(¥;,— YP =(P, -PY 


are substituted in (12:3), (12.11), and (12.13). In (12.14) we have, for n'v,/N 
also negligible, 
L WiPiQn , 8 


= S —p/ 
Vu) =X ai +h Wa (En P) (12.15) 


Corollary 3. Results for the case where the second sample is drawn 
independently of the first, so that the n, do not depend on the n,’ (except for the 
assumption n, =n,'), were given in the second edition, p. 329. For m,/Na 
negligible, the leading term in the variance has the same structure as (12.14), 
being, 


È 252 g'L 7 _ 
von-i eee W,(¥,- ¥)? (12.16) 
h Mh n h 


Papers by Robson (1952) and Robson and King (1953) extend the stratification 
theory to two-stage sampling, applying it to the estimation of magazine reader- 
ship. 
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12.3 OPTIMUM ALLOCATION 
The objective is to choose the n’ and v, so as to minimize V(J,,) for specified 


cost. Let c’ be the cost of classification per unit and c, the cost of measuring a unit _ 
in stratum h. For a specific sample, 


C=c'n'+P chm (12.17) 
h 


Since the n, are random variables, we minimize the expected cost for chosen 7’ 
and vp. 


E(C)= C*=c'n'+n'Y Chv Wn (12.18) 
For V= V(¥,,), formula (12.3) leads to 


2 2 WpS? è 
n'(V+S?°/N)= (S-E WS) 7 ~ (12.19) 


The product C*(V+5S7/N) does not involve n’. Application of the Cauchy- 
Schwarz inequality to this product shows that the product is minimized if, for 
every h, 


Un Ch c' 


Se Co Ta IWSA (12.20) 
This gives 
Vh = Sp lc’ / Ch (S-E W, ST (12.21) 


The value of n’ is obtained from the expected cost equation (12.18). 
By substitution of the optimum v, in the formula for C*(V+S7/N), the 
minimum variance is found to be 
GE 


Vmin Fu) = -AE Wisa t (5? -LW NeT a (2.22) 


Use of formula (12.21) for sample allocation in practice demands more 
knowledge of the population than the sampler is likely to have. Fortunately, errors 
in guessing have compensating features. Thus, if the v, in (12.21) are too high, n’ 
from (12.18) will be too low and the stratum weights will not be determined as well 
as they should be. However, in partial compensation, the stratum means J, will be 
determined more precisely than under the optimum solution. When there is little 
advance knowledge of the stratum weights W,, Srinath (1971) and Rao (1973) 
suggest a slightly different method of choosing the n,, thought to be more robust 
against poor guesses at the W,. 

The case that presents the easiest allocation problem is that in which c, and Sp 
are constant. Then (12.21) becomes 


noes lOgal” a 
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i i i ional stratification to simple 
= S?/S,, is the relative efficiency of proportional os 
S, iran Thus, if we guessed that stratification would reduce V(¥) in half 


so that @ =2, then v, =(c’/c)'/” 


i i i i t illustrates 

. This example did not arise from a double sampling problem, but ill 
eg ia of the SOLASA; We use the Jefferson data from p. 168. In estimating corn 
acres, assume that we could either take a simple random sample of farms or devote some 
resources to classifying farms into two strata by farm size. Relevant population data are for 


corn acres. 
Strata W, s2 Si, Ý, 

1 0.786 = 312 17.7 19.404 

2 0.214 922 30.4 51.626 

Population 620 26.300 


Suppose that C* = 100, c, = 1 =c, and that S2/N is negligible. This implies that if double 
sampling is not used, we can afford to take a sample of n = 100 farms, giving V(¥) = 6.20. 
Let c’ be the cost per farm of classifying farms into stratum 1 (= 160 acres) and stratum 2 
(> 160 acres). Consider the questions: 


1. For what values of c'/c does double sampling bring an increase in precision? à 
2. What is the optimum double sampling plan if c’=c/100, and what is the resulting 
VOe)? 


3. In problem 2, how do the plan and the value of V(¥,,) change if the v, are guessed as 
twice the optimum fractions? 
1. From the population data, = W,S, = 20.4, (S?-ZW,S,?) = 177. Hence from (12.22) 
Vmin (Yar) = 0.01 (20.4 + 13.3Vc')? 


If this is to be less than 6.20 for simple random sampling, c'< 0.14; thatis, c'/c < 1/9. 
2. If c'/c = 1/100, then with the optimum plan, (12.22) gives 


Vmin (Yr) = 0.01(20.4 + 1.3)? = 4.71 
Note that if classification by farm size cost nothing, we would have 
Vmin (Yar) = 0.01 (20.4)? = 4.16 
As regards the details of the plan with c'/c = 1/100, from (12.21), we get 
vy, = S,/133; v, = 0.133; vy= 0.229 


Since £ W,,v, =0:1535, we find, from (12.18), that n'=612, In return this gives 
expected values of 64, 30 for nı, n}. Thus nearly all the money is spent on 
measurement: only 6% on classification. 

3. If we guess v, = 0.266, v, = 0.458, then = W,», = 0.307 and from\(12.18), n'= 315, 
leading to an expected n, = 66, n2 = 31. From (12.3), V (Fu) for this plan will be found 
to be 4.85, only a 3% increase over the optimum 4.71 in problem 2. 


| 
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12.4 ESTIMATED VARIANCE IN DOUBLE SAMPLING FOR 
STRATIFICATION 


If 1/n’ and 1/N are both negligible with respect to 1 (e.g., <0.02), an almost 
unbiased sample estimate of V(¥,,) in (12.14) is simply the sample copy of this 
formula. : 


ia £ 2(_1 1 ge OERA 
6) =} WhSh G -pE Wh (Fn = Fa)? (12.24) 
LWES EWS 8! E EEA a5 
= — —— +2 ps: ea r 
x mn ONN yi Wa Fh = Far) (12.24') 


where g'=(N —n')/(N— 1). This formula will suffice for almost all applications. 
With 1/N and 1/n' not negligible a more complex algebraic expression is needed, 
Theorem 12.3. An unbiased sample estimate of V (J) in double sampling is 
ts | af 1 L) g' a(t 1 ) g Rapp e] 
= — *{——-—]4+2 ————_} +2 = 
(Fer) (n= DN x WhSh n'v, N. mosh N n'v, ne Wa (Yh Yee) 
(12.25) 
Proof. From (12.13) the general form of the variance to be estimated is 


1 1 4 r Sie 
Vu) =E W, (11) Nga ~1)5.24.2 2a 
(Fse) x nSh tana Wh 1)S, tye W,.(¥, — Y) 


n'y, N. 
(12.26) 


By averaging first for fixed n’ and w, and then over variations in the Wh, the 
average of wps in (12.25) is W,S,7, while that of s,” is Sk’. These results will be 
used after equation (12.31). 

In the last term in (12.25) 


E Wa Sn — Iu)? =D wan? -jA (12.27) 
Averaging first for fixed wp, 


EO Wi) =} wai + a EES ds ) 
E wan) =L wa Yn +E wiSp = wN. (12.28) 
Furthermore, 
= 1 1 
=a a 24) (4-1) 
EEQ wan ) =E WY, +E Sh TA N (12.29) 
Also, 


E?) = F+ Vya) (12.30) 
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Subtracting (12.30) from (12.29) and multiplying by g'/n' gives 


/ 7 ee en ee 
BEY m- =2 [5 W,.(¥, — Y)° +L sit +) Va) 
n 


(12331) 
Substitute (12.31) in finding (n’— 1)NEv(¥,,)/n'(N— 1) from (12.25). We get 


DN ic (4-8) yee) EDN 
yee) = (1-£) vov- Evo, 


This proves the result. Note that the two middle terms in (12.25) are of order 
1/n'N and 1/n’?v, and are negligible relative to terms retained if 1 /N and 1/n' 
are negligible. This supports the simpler form (12.24). 

Rao (1973) has given the result (12.25) in terms of the n, and n,’ as follows. 


Bie N= (ah moat) ws (N-n') a SR 
V(Ja)= N x n’-1 N=1 PA NGI) Wh (Fn — Ver) 
(12.32) 
Corollary. To use (12.24) in the estimation of a proportion, put p, for J, and 
MnPrGu/ (my — 1) for s. i 


Example. In asimple random sample of 374 households from a large district, 292 were 
occupied by white families and 82 


by nonwhite families. A sample of about one in four 
households gave the following data on ownership. 


Owned Rented Total 


White 31 43 


74 
Nonwhite 4 14 


18 


SSS SS Eee 


Estimate the proportion of rented households in the area from which the sample was drawn 
and its standard error. 


If the first stratum consists of the white-occupied households. 
_ 292 _ _ 82 
wg Fe La MAT gg 022 
43 


14 
=—=0.58, == 
Pi 74 P, 18 0.78 


Pu = WPi + W2P2 = 0.624 
n'=374, m=74,  n,=18 
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It is readily found that only the first term in (12.24) is of importance. Hence 
WrPrn _ y. Wa Pagh _ (0.78)°(0.2436) | (0.22)*(0.1716) 


2 (Pa) =I 


n(n, —1) vn! a (nh —1) 73 17 
= 0.00252 
S(Pa) = 0.050 


The estimated proportion of rented households is 0.62+0.050. 


12.5 DOUBLE SAMPLING FOR ANALYTICAL 
COMPARISONS 


Section 5A.13 dealt with the determination of sample sizes in subgroups of the 
population for certain analytical comparisons among L subgroup means. If the 
objective is to have the variances of the differences between the means of every 
pair of subgroups all equal to V, it was noted that a simple compromise allocation, 
often adequate, is to specify or minimize only the average variance of the 
L(L —1)/2 differences. 

—Y—=V 12.33 
í Ln ( ) 

In another application, Booth and Sedransk (1969), the subgroups formed a 

2X2 factorial classification, the problem being to estimate row and column effects 


with equal precision. In this case the variance to be specified or minimized took 
the form 


EE (62+ 6; )—-=V (12.34) 
J 


where the 6;, and 0; are weights chosen by the investigator, the symbols (ij) 
denoting the 4 subgroups. 

In the discussion in section 5A.13, subgroups could be sampled directly. The 
problem becomes one of double sampling when members of the subgroups'cannot 
be identified in advance but units can be classified relatively cheaply into the 
subgroups from an initial simple random sample of size n’. If n,’ of the initial 
sample fall in subgroup i, the 7; for the final sample that is measured to provide the 
desired comparisons are drawn from the n;’. The simplest cost function is 


C=c'n'+c Yn, =c'n'+cn ; (12.35) 


where we assume c'<« c. 
For fixed nj, both relations (12.33) and (12.34) are of the form 


Y= (12.36) 


where we assume that the S;, Si? and hence the a; are known in advance. 
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When the objective is to minimize V for given GE Sedransk aged roe 
and Sedransk (1969) try different values of n'in turn. For given n',t eva e i X 
then known from (12.35). For given n, the n; that minimize V are n; be : a 
Difficulty may arise in using these n; however, because the n;’ provi k i y n 
initial sample are random variables. In some subgroups we may n E E 
na;/ (© a;) so that the minimizing n; exceeds the available n,’. Allocation ru on 
handle this situation will be illustrated for L =3 groups, from Sedransk (19 J; 
Number the subgroups in increasing order of n;'(¥ a;)/na; and let È a;) denote 
the sum over all classes except the first. Take 


_ na, 2. na, 
a (ia) i T a) 
; f, a 
n=n, if ny SEE 
a(v=n)a, ,(n=n))az 
Tia ny Vey S aja (12.37) 
1 1 
D-a 3 ı—(n—n)az 
m= n if e aN 
1 


nz=n=n =n 


where ¥ a; =} a,—a,. 
1 


These rules are not complete, but will cover most n’ likely to be near optimal. 


The principle is to keep close to the n; oc a allocation. See Booth and Sedransk 
(1969) for more detail. . 


Example. An example in which double sampling should perform well is as follows: 
C= 2,000, c'= 1, ¢ =10. The three subgroups are of relative sizes W,=0.05, W,= 0.25, 
W;=0.70, a? = S?= 10 (i= 1, 2,3). Consider first single sampling. Since it costs 11 
monetary units to select, classify, and measure a Sampling unit, we can afford n = 182 with 
single sampling. i 

Optimum allocation would require equal n, but on the avera 
single sampling with n = 182 are 9.1, 45.5, 127.4. Assuming E(1/ 
V from single sampling is approximately 


ge the values of n; from 
n)=1/E(n,), the average 


ie ieee ) 
kvota P LAN 


With double sampling, calculations for different n' show that the optimum n’ is close to 
620, giving n = 138. With n = 138 the optimal n; is 46 in each class. However, n'= 620 
Provides only an expected 31 in class 1. Hence, on the average, the allocation rule (12.37) 
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for double sampling gives n, = E(n,') = (0.05)(620) = 31, n} = n, = 53.5, leading to 


l 2 
E(V)= TS =0.70 


a 50% reduction from E(V) with single sampling. 


Rao (1973) handles these problems by the method used with stratified sam- 
pling: this method specifies the fraction v; of the n;' in the ith subgroup that are to 
be measured. An advantage is that the optimum n’ can be determined analytically. 
With C=c'n'+cn as before, the expected cost is 


C*=c'n'+cn'> vW; (12.38) 
where W, is the relative size of subgroup i. Assuming as before that 
E(1/n,)=1/E(n,), the average V is 

L 


a: 
V)= a ; 
E(V) LW, (12.39) 


By the Cauchy-Schwarz inequality, the optimal v; for fixed n' is given by 
a;_(C*=nic!) 
(È a;) c 


provided all v, = 1. Substitution of n'W,v; from (12.40) into (12.39) gives, for the 
minimal E( V). 


n'Wn, = 


(12.40) 


Par 
Ce 
Since E( V) in (12.41) decreases as n' decreases, the optimum n‘ will decrease at 


least until it reaches the value m,, say, at which one of the v; say v, becomes 1. At 
this point (12.40) and (12.41) no longer apply. From (12.40), v, becomes 1 when 


E(V)= (12.41) 


| a,(C*—n'c') 
w= 4 
n Wi cca) (12.42) 
Solving (12.42) for mı, the value of n'at which v, =1, we get 
(ex 
m,= (12.43) 


[c'+cW (È a;)/a)] 


To examine values of n’ smaller than m,, set v;=1 and use the Cauchy- 
Schwarz inequality to obtain the remaining v;. We find, for i > 1, 
a; C*=n'(c'+cW)) 
n Wiis a ae a 12.44 
È a;i) c ( ) 
1 
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where X a; =} a, — a. The resulting minimal E(V) is now 
i 
1 


5 cQ a) 
1 


i (12.45) 
E AW Ct—n'(c'+cW,) 
From this E(V) the derivative dE(V)/dn' vanishes at 
Q aj) 
n'=ct/{(e+ew,)+ = [ewite +ew,)]"?} (12.46) 
1 


The value m2 at which v, and v2 are both 1, so that (12.45) ceases to hold, is 


cW2(Y a) 
— (12.47) 


m= c*/[(e'+ew,)+ : 

a 
Thus expressions (12.45) and (12.46) apply only over the range m; =n'= mn. If 
dE(V)/dn' does not vanish for n' = mz, we need to set v;, vz, and so forth, = 1 in 
turn until the turning point of E(V) is found. In many situations for which double 
sampling is economical, however, the turning value of E( V) occurs for m,=n'= 
mz. 


Example. For the worked example in this section, C* =2000, c'=1,c = 10, a? =S? = 
10, W, = 0.05, 0.25, 0.70. We have, from (12.43) and (12.47) 


2000 
~ [1+ (10)(0.05)(3)~ 
N 2000 
[1.5 + (10)(.25)2] 
Furthermore, dE(V)/dn’ in formula (12.46) vanishes at 
o TA 2000 
{1.5+ (2)[(0.5)(1.5)}} 


800 


=308 


mz 


=619 


Since this value lies between 800 and 308, it gives the required minimum. Formula (12.44) 
gives v, = .346, v; = .124. Numerically this solution is essentially the same as that found by 
Şedransk’s method, which had n' = 620 and similar values of n, in the three subgroups. 


Both methods extend easily to the case of differential costs of measurement in 
different subgroups. Suggestions are also given (Rao, 1973) for the case where 
E(1/n;) is substantially larger than 1/E(n;). 


12.6 REGRESSION ESTIMATORS 


In some applications of double sampling the auxiliary variate x; has been used 
to make a regression estimate of Y. In the first (large) sample of size n’, we 
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measure only x;; in the second, a random subsample of size n = yn’ =n'/k where 
the fraction v is chosen in advance, we measure both x; and y,. The estimate of Y is 


r= ý + b(z'—x) (12.48) 


where x’, ¥ are the means of the x, in the first and second samples and b is the least 
squares regression coefficient of y; on x; computed from the second sample. 

If no assumption is made about the presence of a linear regression in the 
population, J, will be biased, just as in one-stage sampling (Chapter 7). An 
approximation to V(jj,) can be given, assuming random sampling and 1/n and 
1/n' negligible with respect to 1. P 


PPE S e E 
vyp E eS S (12:49) 


Proof. In finding the sampling error of f, in simple random sampling, we 
showed that if b in F, is replaced by the finite population regression coefficient 
B= S,,/S,’, the error in the approximation is of order 1/Vn relative to that in Sir 
The same device applies here. We therefore examine the variance of the approxi- 
mation 


fr = ¥ + B(Z'—-=z) 
The subscripts 1, 2 will denote variations over the first and second phases of 
sampling. Let u; = y; — Bx,. In the second phase, regard the large sample as a finite~ 


population. Then, since the small sample was drawn at random from the large 
sample, 


* a a seed Nees 
E(f) = y': Vae) = (2-4)s, 
non 


where s,,’" is the variance of u within the large sample. It follows that 
VON VGn)= VOJE (24), (12.50) 
n 
PAT hes 
= as-is —p’) (12.51) 


since s,'" is an unbiased estimate of §,?= S; (1-p°). Hence 


= ES (-p’)  p?S,_ $? 
OSs ae oN (12.49) 
This completes the proof. 

As in section 7.8, suppose we assume, following Royall (1970), that the finite 
Population is itself a random sample from an infinite superpopulation in which a 
mear regression model holds. Then J, becomes model-unbiased and exact 
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small-sample results for its variance can be obtained. Let the regression model in 
the superpopulation be : 


y=a+ßx+e (12.52) 


where, for given x’s, the e’s are independent with means 0 and variance ay (1 = 
p°) where g, and p are now parameters of the superpopulation. 
On st betituting for y, Y, and b from (12.52), straightforward algebra gives 


J- ¥=6,—éy+p(e'— X)+ FE ale -2) (12.53) 
z È (x-z)? 


By averaging over the distribution of the e’s, it follows from (12.53) that J, is 


model-unbiased for fixed x's in the finite population and the two samples. 
Furthermore, from (12.53), 


wey 
FUG, + Phl = 00-1) +E -Rop Gs) 


Gar)? 
(12.54) 


The last term on the right in (12.54) arises from the sampling error of b and is of 
order 1/n relative to the first two terms on the tight. Averaging the first two terms 
on the right over the distribution of the x’s created by repeated random selections 
of the finite population and the two samples, we get 


r nD +41) 2_2 1-1 2 
EV(¥i,)=o,7(1—p \(2 N) tP oy (4 N (12.55) 
a 21 —p2 ae 2 
-Aiel petn o (12.56) 
n n N 


This expression has the same form as (12,49 
refer to the superpopulation. 

Double sampling with regression has been exte 
(1967) to the case where p auxiliary x variables are measured in the second 
sample, Y being estimated by the multiple linear regression of y on these 
variables, With the second sample a random subsample of the first and with 
multivariate normality assumed for y and the x’s, the extension of (12.54) for 
p> 1 gives for the average variance 


) except that in (12.56) a,” and p 


nded by Khan and Tripathi 


vop- SSB amps? s 
i n n' 


SEPTA E 


is the multiple correlation coefficient between y and the x’s. 
wee a where ie second sample is drawn independently of the first was 
considered by Chameli Bose (1943). 
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12.7 OPTIMUM ALLOCATION AND COMPARISON WITH 
SINGLE SAMPLING 


From the variance formula (12.49), which assumes 1/n negligible, double 
sampling with a regression estimate can be compared with a single simple random 
sample with no regression adjustment. We have 


* ¢2 2ra t= 2 2¢2 
vS eS, C=cn+c'n' (12.58) 
n 


By the Cauchy-Schwarz inequality, the product VC is minimized when 
2 ta’? 


cn cn n «| cape 
Sa ee TE ==|- 12.59 
Susa so that Fa [£ 7? ( ) 
Substitution in VC gives 
cs? 
(VO) min = Sp Vell =p?) + Ve'p?)? -— (12.60) 
7 N 
Thus, for a specified cost C, 
/ 1=p2)+ TAPT 2 2 
VAR ssp teie tee s (12.61) 


If all resources are devoted instead to a single sample with no regression 
adjustment, this sample has size C/c and the variance of its mean is 


7) = Sy S on E 
Vig) ae == (12.62) 


Hence, optimum use of double sampling gives a smaller variance if 


¢>(Ve(1—p*)+Ve'p?)? (12.63) 
This inequality may be expressed in two ways. 

'c_ (1+V1—=p?)? 

a a ae (12.64) 


j 


or 


4(c/c') 
E ARCI (12 55) 


Equations (12.64) and (12.65) give the critical ranges of c/c' for given p and of p 
for given c/c' that make double sampling profitable. 
Figure 12.1 plots the values of the ratio c/c’ (ona log scale) 


1 against p. Curve Lis 
the relationship when double and single sampling are equal 


ly precise; curve II 
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holds when Vep = 0.8 V(j), that is, when double sampling gives a 25% increase 
in precision; and curve Ill refers to a 50% increase in precision. For example, 
when p =0.8, double sampling equals single sampling in precision if c/c’ is 4, 
gives a 25% increase in precision if c/c’ is about 73, and a 50% increase if c/c’ is 
about 13. 


8 


LZ 
I 


‘Ratio of cost per unit in secand sample to cost per unit in first sample 


0.8 0.9 10 
p = correlation between y; and x; 


Fig. 12.1 Relation between c/c' and p for three fixed i A F 
single sampling. values of the relative precision of double and 
Curve I: double and single sampling equally precise 


Curve II: double sampling gives 25 per cent increase į 
A < ‘ase isi 
Curve III: double sampling gives 50 per cent e E 


For practical use, the curves overestimate the gains to be achieved from double 
sampling, because the best values of n and n’ must either Be eal ils df 
previous data or be guessed. Some allowance for errors in the ya “e aah 
should be made before deciding to adopt double sampling a agar, 

For any p, there is an upper limit to the gain in precision ate double samplin 
This Occurs when information on £ ‘is obtained free (c’ = 0). The upper limit to ‘ie 
relative precision is 1/(1— p°). j 
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12.8 ESTIMATED VARIANCE IN DOUBLE SAMPLING 
FOR REGRESSION 


If terms in 1/n are negligible, V(¥,) is Ps Ri (12.49): 


Vir) = Sloe) ye psy sf 
ne PN 
With a linear regression model, ite quantity 
1 n à n z 
Syd [ È iA ¥)?- 8? Y (x -3)] (12.66) 
n-2b;2; i=) 
is an unbiased estimate of S, (1—p°). Since 
git È (yi aay 
ef n—1 
is an unbiased estimate of S,”, it follows that 
2 2 
Sy 383, 
is an unbiased estimate of p’S,”. 
Thus a sample estimate of V(Fy) is 
v= Beye 12.67 
Yir. K a N (12.67) 


If the second sample is small and terms in 1/n are not negligible relative to 1, an 
estimate of variance suggested for simple random samples from (12.54) is 


= 2 — 
vn) = 3,44 Si =| aa (12.68) 


This is a hybrid of the conditional variance and the average variance. 


12.9 RATIO ESTIMATORS 


If the first sample is used to obtain 7’ as an estimate of X in a ratio estimate of y, 
the estimator of Y is 


vei 
Yr =F (12.69) 
To find the approximate variance, write 
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The first component is the error of the ordinary ratio estimate (section 2.11). In 
obtaining the appropriate error variance in section 2.11, we replaced the factor 
X/Z by unity in this term. To the same order of approximation, we replace the 
factor 9/7 in the second component by the population ratio R = Y/X. Thus 
Jr- Y=(¥—Rz)+R(z'—X) (12.70) 
If the second sample is'a random subsample of the first, 
z < Sele L\ a 
E:r- Y)=y'—Y: VFR -ye(4-1)., g (12.71) 
where s,’* is the variance within the second sample of the variate d; 
Averaging now over repeated random selections of the first sample, 


Viva) = ViE2+ E, V> 


=y- Rx; 


(1 i) } (: 1) $ wat 
(ene! pig as 292 72 
(4 n) Sy + -=)(S,°-2RS,, + RS,?) (12.72) 


since 5,’ is an unbiased estimate of S? =S,7-2RS,, + R?SÈ. 
Separating the terms in l/n and 1/n' we get i 

=) Sy IRS + R282 IRS, — R25? g: 

Yaa acer (12.72') 


t 


12.10 REPEATED SAMPLING OF THE SAME POPULATION 


The practice of relying on samples for the collection of important series of data 
that are published at regular intervals has become common. In part, this is due to a 
realization that with a dynamic population a census at infrequent intervals is of 
limited use. Highly precise information about the characteristics of a population in 
July 1960 and July 1970 may not help much in planning that demands a 
knowledge of the population in 1976. A series of small samples at annual or even 
shorter intervals may be more serviceable, 


When the same population (apart from the ¢ 
introduces) is sampled repeatedly, the sampler 
realistic estimates both of costs and of variances 
lead to optimum efficiency of sampling. One imp 
and in what manner the sample should be ch, 
considerations affect the decision. People may 
of information time after time. Thé respondent 
which they receive at the interviews, and this may make them progressively less 
representative as time proceeds. Sometimes, however. cooperation is better in a 
second interview than in the first, and when the information is technical or 
confidential the second visit may produce more accurate data than the first. 


hanges that the Passage of time 
is in an ideal position ta make 
and to apply the techniques that 
ortant question is how frequently 
anged as time progresses. Many 
be unwilling to give the same type 
s may be influenced by information 
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sample and the related question of making estimates from the series of repeated 
samples. The topic is appropriate to the present chapter because double sampling 
techniques can be utilized. 

Given the data from a series of samples, there are three kinds of quantity for 


The remainder of this chapter considers the question of replacement of the 
which we may wish estimates. z 


1. The change in Y from one occasion to the next. 
2. The average value of Y over all occasions. 
3. The average value of Y for the most recent occasion. 


In most surveys, interest centers on the current average (3), particularly if the 
characteristics of the population are likely to change rapidly with time. With a 
b population in which time changes are slow, on the other hand, an annual average 
(2) taken over 12 monthly samples or four quarterly samples may be adequate for 
the major uses. This would be the situation in a study of the prevalence of chronic 
diseases of long duration. With a disease whose prevalence shows marked 
seasonal variation, the current data are of major interest, but annual averages are 
also useful for comparisons between different regions and different years. Esti- 
Ph mates of change (1) are wanted mainly in attempts to study the effects of forces 
that are known to have acted on the population. For instance, if a bill is passed that 
is supposed to stimulate the building of houses, it is interesting to know whether 
“the building rate of new houses has increased in the succeeding year (with a 
realization that an increase may not be entirely due to the bill). 
Suppose that we are free to alter or retain the composition of the sample and 
that the total size of sample is to be the same on all occasions. If we wish to 
b maximize precision, the following statements can be made about replacement 
policy: 
1. For estimating chan 
occasions. 
2. For each estimating the average ov 
sample on each occasion. 


3. For current estimates, equal precision is obtained either by keeping the same 
sample or by changing it on every oceasion. Replacement of part of the sample on 
each occasion may be better than these alternatives. 


ge, it is hest to retain the same sample throughout all 


er all occasions, it is best to draw a new 


Statements 1 and 2 hold because there is nearly always a positive correlation 
between the measurements on the same unit on two successive occasions. The 
estimated change on a unit has variance S+ S? -—2pS;S, where the subscripts 
refer to the occasions. If the change is estimated from two different units, the 

| erence is Sit Sy’. In estimating the over-all mean for the two occasions, the 
| Variance is (S, + S3” + 2pS,S,)/4 if the same unit is retained and (S,? + SY) /4ifa 
unit is chosen. 


tatement 3, which is less obvious, is investigated in Succeeding sections 
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12.11 SAMPLING ON TWO OCCASIONS 


Suppose that the samples are of the same si 


ze n on both occasions and that the 
current estimates are of primary interest. Re 


placement policy has been examined 


e that simple random sampling is used. 
The mean of the first sample has varience Si 


Notation. 

Yiu = mean of unmatched Portion on occasion h 

Ynm = mean of matched Portion on occasion h 

Yn = mean of whole sample on occasion h 

The unmatched and 
dent estimates j,,,’, J3, matched portion we 
use a double “large” sample is the first 
is the value of yi on the first occasion. The 


n ur m and n correspond to 
n and n', respectively, in (12.49). The fpc is ignored. 


TABLE 12.1 


UNMATCHED AND MATCHED PORTIONS 
Estimate 


ESTIMATES FROM THE 


\ 


Uninatched: Ja 


Matched: 


The best combined estimate of Y is found by weighting the two independent 
estimates inversely as their variances. If Win 


2m are the inverse variances, this 
estimate is 


Pe = pau + (1 =O?) Fom 
Wou 


Fan aeee 
Wau + Wom 
By least squares theory, the variance of 7," is 


h (12.73) 
where 


p2 
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From Table 12.1, this works out after simplification as ` 
2 
VQ2) = ae (12.74) 


Note that if ù =0 (complete matching) or if u =n (no matching) this variance 
has the same value, S°/n. 
The optimum value of u is found by minimizing (12.74) with respect to variation 
in u. This gives 
4A ET (12.75) 
n 1+V1-p* n 1+V1-p? i 
When the optimum u is substituted in (12.74), the minimum variance works 
out as 


: $3 
Vorl) = 5-1 EVIE) (12.76) 


Table 12.2 shows for a series of values of p the optimum percent that should be 
matched and the relative gain in precision compared with no matching. The best 


TABLE 12.2 


OPTIMUM % MATCHED 


at dae 

Optimum % gain in n gen p \ 

P % matched precision snes = 

now 3 n 4 
0.5 46 7 7 6 
0.6 44 ll 11 9 
0:7 42 : 17 17 15, 
0:8 38 25 25 23 
0 30 39 39 39 
0.95 24 52 50 os 
67 75 


1.0 0 100 
gi 


percentage to match never exceeds 50% and decreases steadily as p increases. 
When p= 1, the formula suggests m =0, which lies outside the range of our 
assumptions, since m has been assumed reasonably large. The correct procedure 
in this case is to take m = 2. The two matched units are sufficient to determine the 
regression line exactly. k 

The greatest attainable gain in precision is 100% when p = 1. Unless P is high 
the gains are modest. 
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Although the optimum percentage to match varies with p, only a single 
percentage can be used in practice for all items in a survey. The right-hand 
columns of Table 12.2 show the percent gains in precision when one third and one 
fourth of the units are matched. Both are good compromises, except for items in 
which p exceeds 0.95. 

Kulldorfi (1963) gives an extensive discussion of this model. He considers the 
case where on the second occasion the cost of measuring a matched unit (i.e., one 
previously measured) may differ from that of measuring a new unmatched unit 
and does not assume equal sample sizes on two occasions. Thus, apart from fixed 
costs, his cost on the second occasion is 


Cz $ 
C= MCm + Uc,: —=môő+u (12.77) 
Cu 
where 5=c,,/c,. If sample sizes are the same on the two occasions so that 
m +u =n, the optimum unmatched proportion on the second occasion is found by 
minimizing 
VC, 
char 


(n—up*) d- yp?) 
=[né +u(1 -—6)]-5—5-5 = [6 +x (1-6)] (12.78 
[n u( luup [ al N29?) ( ) 
where u =u/n and V comes from (12.74). If ô <1, matching being cheaper, the 
optimum proportion matched is, of course, greater than the values in Table 12.2. 


He also deals with the case where the costs are to be the same on the two 
occasions. 


In some applications the data for occasion 1 provide several auxiliary variables 
correlated with y2, one of which will, of course, usually be y,. For example, in 
estimating the kill y, of waterfowl per hunter in Ontario from 1968 to 1969, Sen 
(1973a) found that the kill per hunter and the number of days hunted in 1967 and 
1968 were both correlated with y>. In this paper he extended the preceding 
analysis to the case where y2,, is adjusted by its multiple linear regression on the 
auxiliary variables and where the samples on the two occasions are of unequal 
sizes. With large samples of equal size the only change in (12.76) for Vopr (¥2') is to 
replace p° by the square R? of the multiple correlation coefficient between y2 and 
the auxiliary variables (assuming multivariate normality). The corresponding 
theory for the case where f>,, is adjusted by the multivariate ratio estimate is 
given in Sen (1972) for equal sample sizes and in Sen (1973b) for unequal sample 
sizes. 


12.12 SAMPLING ON MORE THAN TWO OCCASIONS 


The general problem of replacement has been studied by Yates (1960) and 
Patterson (1950), with respect to both current estimates and estimates of change. 
When there are more than two occasions, the opportunities for a flexible use of the 
data are increased. On occasion h we may have parts of the sample that are 
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matched with occasion A — 1, parts that are matched with both occasions A — | and 
h—2, and so on. In attempting to improve the current estimate, we might try a 
multiple regression involving all matchings to previous occasions. It is also 
Possible to revise the current estimate for occasion h — | after the data for occasion 
h are known. In the revised estimate the regression of occasion h —1 on both 
occasion h —2 and occasion h could be utilized, assuming that suitably matched 
portions of the sample were available. 


TABLE 12.3 
ESTIMATES OF Y, ON THE Ath OCCASION 


Estimate Variance 
oe S? l 
Unmatched: Fru = Fru a = Wa 
3 í SURAI E, ji 
Matched: Van’ = Yam TOKA ~ Yn~ am) a re MSR 


The present section contains an introduction to the subject. Attention will be 


‘restricted to current estimates in which only the regression on the sample 


immediately preceding is used. This results in some loss of precision but, since the 
correlation p usually decreases as the time interval between the Occasions is 
increased, the loss of precision will seldom be great. The variance $? and the 
correlation coefficient p between the item values on the same unit on two 
Successive occasions are assumed constant throughout. 

On the Ath occasion let m, and u, be the numbers of units that are matched and 
unmatched, respectively, with the (h — 1)th occasion. The two estimates of Y, that 
can be made are given in Table 12.3. The only change in procedure from the 
second occasion (Table 12.1) is that in the Tegression adjustment of the estimate 


from the matched portion we use the improved estimate Yi. instead of the 
sample mean y,_). 


The variance of the matched estima 
on page 339. Note that (a) 
p°S,?/n' in (12.49), which 
when S is constant on su 
earlier analysis. 

We now examine the precision obtained if the optimum mM, and u, and the 
Optimum weights are used on every occa 


y sion. It will be found that the optimum 
m,/n increases steadily on successive occasions, rapidly approaching a limitin 
Value of }. Š 


te Yam in Table 12.3 is derived from (12.49) 
our m corresponds to the n in (12.49) and (b) the term 
equals B? V(x"), is replaced by P? V(9h-1), since B = p 
ccessive occasions and Fh, corresponds to ¥' in the 
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Weighting inversely as the variance, the best estimate of J, is 
Yh = Pr¥nu’ +(1— Or) Fim’ (12.79) 
where n = Wru/(Whu + Wam). This gives 


1 Ers? 
HA e A 
(679) Wi + Wan a 
“where gn denotes the ratio of the variance on occasion h to that on the first 
occasion. Substituting for W,,,, W;,,, from Table 12.3, we have 


Så n 2 1 
Vo.) g S (Wru + Wam) = un +—— — — (12.80) 
VOW) et Vine (=p), P 8ni 


Mpy n 


We now choose m, and u, to maximize this quantity and therefore to minimize 
V(¥,'). Writing up =n-m, and differentiating the right side of (12.80) with 
respect to m, we obtain 


Yop = (: A 
me mM, n 
This gives, on solving for the optimum 71), say, 


my, V1-p? 
TSS OS eee (12.81) 
n 8,-1(1+V1—p’) 


When this value is substitut 


ed in (12.80), the relation becomes, after some 
algebraic manipulation, 


e re oh (12.82) 
En gr-1(1+71-p°) 


This relation may be written 
m=1+ bry; 


where r, = 1/g, andr, = 1/g, = 1. Repeated use of this recurrence relation gives 


ER 
E ete eg pete tse" 
En 1-b 


where, from (12.82), b=(1—-V1—p*)/(1+V1—p?). Since 0<b < 1, the limiting 
variance factor gq IS 
= {pause (12.83) 
was 1+V1—p? 
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Hence the variance of f,’ tends to 


Veo’) = 2 12) (12.84) 
a br a A 
n \1+V1—p? 
Finally, the limiting value of 7, is obtained from (12.81) as 
fia. NI pe N 


1 
2 
irrespective of the value of p. 

Table 12.4 shows the optimum percentage to match—1007ñ,/n, as found from 


(12.81)—and the resulting variances for p = 0.7, 0.8, 0.9'and 0.95 and for a series 
of values of h. 


TABLE 12.4 
OPTIMUM % MATCHED AND VARIANCES 


% matched 100ri,/n En = NVG S? 


p= 
0.9 0.95 


By the fourth Occasion, the optimum percent matched is close to 50 for all the 
values of p shown, although a smaller amount of matching is indicated for the 


second and third occasions. The reductions in variance, (1 gin 
»(1—g,), are 
less than 0.8. (1—g,), are modest if p is 


12.13 SIMPLIFICATIONS AND FURTHER DEVELOPMENTS 


In practical application the Preceding analysis may need modification. We 


assumed that all replacement policies cost the same and are equally feasible. With 
human populations, field costs are likel 


for a number of occasions, If estimates of the chan: 


latched constant, 
we will investigate the 
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variances of f, and of the estimated change (a= Fa 1) when m, u, and ġ are held 
constant. We continue to write V(Fa)=g8g,S7/n, although the actual value of ga 
will be different from that in the preceding section. 

The estimate is now 


Fn = bY nu’ +(1- b) Yin 
Substituting the expressions for the two variances (from Table 12.3), we have 


Site as 2 
VR =E G Vn #1 AVS) 


ee -¢y =) EU-A g 
u m n 
Hence 

a -(£ +" ze mAn 


Jea =p) gni (12.85) 


where u =u/n, A = m/n. Write this relation as 


Bn =a + bgn- 
By repeated application, we have, since gi =), 
a(l-b""') P > 
= +b 
En l-b 


Since b = p?(1—¢)? is less than 1, the limiting value is 
T Ab? thll -oU =p*) 
l-b  Au[l=p(1-d)"] 
The value of the weight œ that minimizes the limiting variance may be found by 


differentiating (12.86). This leads to a quadratic equation whose appropriate root 
is 


(12.86) 


vo 


V1=plV 1p" +4Aup*—V1—p'] 
vive Wp! + 4Aup*=V1-p] 
2Ap~ 


In practice, the value of p will not be known exactly and will differ from itemito 
item. A compromise value can usually be chosen. Clearly, bop, Will be less than 
u =u/n, since the matched part of the sample gives higher precision per unit than 
the unmatched part. For example, with u =0.25, that is. } of the sample 
unmatched, op turns out to be 0.216, 0. 198, and 0.164 for p = 0.7, 0.8, 0.9. The 
choice of @ = 0.2 would be adequate for this range of p. 

For the estimate of change, we have- 


Vn! — Fh) = Vn!) + VIr -1)= 2 cov (Va Fh -1) (12.87) 


Popi = 
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To find the covariance term, note that if Ya; y,—1,; are the values for the ith unit 
in the matched set on occasions h and (h — 1), our model is 


Yri = Yn + (Yn—1e — Yn-1) + eni 


where the e,; are independent of.the y’s. From this model it is found by 
substitution that 


Frm! = Frm +P h-1— Yh-1m) = Yn + P(Vn—1— Yn-1) + Sam 
Hence the covariance of Jam’ and Fh- is pV(Fh-1). But 
COV (Fn Fp—1) = COV {hu + (1 — b) Fam THs} =P- b) Vh) 
since Fau is independent of ¥;_,. From (12.87), this gives 


VON =i) = Ke + g,-[1—2p(1—-¢)}} (12.88) 


From (12.85) and (12.87), the variances of ¥,' and (¥),’—¥},-;) may be com- 
puted for any values of m, œ, and p. Table 12.5 shows the resulting percent gains in 
efficiency for these estimates relative to the estimates obtained from independent 

TABLE 12.5. 


PERCENT GAINS IN EFFICIENCY FOR THE CURRENT ESTIMATE Ja AND THE 
ESTIMATE OF CHANGE (F,p'— Fh -1). PROPORTIONS MATCHED: } AND 3 


Current estimate: percent gain = 100(1—g,,)/g, 


N-D 


=0.7 p=0.8 p=0.9 p=0.95 
1 
1 3 


3 1 
4 2 4 


2 14 10 22 14 33 19 41 22 
3 16 14 30 Peery) 32 67 39 
4 17 15 32 24 SO EEA OTT 579 52 
© 17 15 33 26 62 50 89 74 


RG" 5 10° 27 18. sS6u 40) ene 


h Estimate of change: percent gain = 100(2—g,’)/g,’ 


2 106 153 156 233 245 399 -326 565 
3 113. 160 170 245 277 415 365 588 
4 115 160 174 251 285 424 388 603 
oe) 115 163 178 251 292 440 397 624 


RG 101 163 p66 ~ 269" 381" “605° = oes 


“Described on p. 355. 
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samples on each occasion. The proportions matched are A = m/h =4 and 3. The 
weight @ was taken as 0.35 for A =} and 0.2 for A =3. h 

Not surprisingly, the major features of Table 12.5 are the large gains in 
efficiency for the estimates of change when p is at least 0.7. Moreover, increase in 
the proportion matched from 4 to 3 produces substantial gains in efficiency for the 
estimates of change at the expense of smaller losses in efficiency for the current 
estimates. The results suggest that retention of 3, 3 or # from one occasion to the 
next may be a good practical policy if current estimates and estimates of change 
are both important. 

Comparison of the gains in efficiency for the current estimate f,’ in Table 12.5 
with the optimum gains from Table 12.4 suggests that after the second occasion 
little precision is lost by using a constant weight and a fixed Proportion matched, 
unless p=0.95. 

If p exceeds 0.8, the regression coefficient b = p may be replaced by 1 with only 
a small additional loss of precision. This gives an estimate Fa” of the form 


Dn” = Pn H-k -1+ Fam —Fr—1ym) (12.89) 


In the important Current Population Survey taken monthly by the U.S. Bureau 
of the Census, one quarter of the second-stage units are replaced each month, so 
that an individual household remains in the sample during four consecutive 
months. The household is omitted for the eight succeeding months but is then 
brought back for another four months, thus increasing slightly the precision of 
year-to-year comparisons. 


The composite estimate used in this Survey is of a form related to (12.89) but 
slightly different. : 


Ya" =(1— K) Fn + K (Fh + Fam ~ Trt) (12.90) 
where K is a constant weighting factor. The difference is that Yn, the current 
estimate for the whole sample, takes the place of the y,,,, in (12.89). The quantities 
Yim» Yr-1,m Yn in (12.90) are ratio estimates of a fairly complex type. 

The variance of y,” (due to Bershad) is given in Hansen, Hurwitz, and Madow 


(1953); see also the Appendix in Hansen et al. (1955). The estimator of the 
month-to-month change is 


dr = Yn" = Fk 
Since the primary units remain unchanged, only the within-units components of 
V(¥,") and V(d,) are affected by the sample rotation policy. 

Rao and Graham (1964). have examined the performance of composite 
estimators in rotation policies of this type, in which a respondent remains in the 
sample for r months and then drops out for m months. They used as models an 
exponential correlogram and a linear Correlogram in time, descending to zero. A 


more complex correlogram, needed if there is a high correlation between months 
h and (h — 12), has been studied by Graham (1973). ‘ 
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Their gains in efficiency for r=2, 4 and m =œ correspond to the results for 
A =3, jin Table 12.5. For an exponential correlogram in which the correlation 
between results on the same unit on occasions h, h — is p$ the lines labeled RG in 
Table 12.5 show for p = 0.7, 0.8, 0.9, their percent gains in efficiency. For each p 
they use in (12.99) the optimum K for the current estimates as h > 00. As Table 
12.5 shows, they also find that the gains in efficiency from the composite 
estimators are much greater in estimating change than current level. 

In a more general framework, Scott and Smith (1974) have discussed the role of 
time series methods in making estimates in repeated surveys of various types. 

In another rotation policy a new sample is drawn on each occasion, with no 
matching. With weekly or monthly sampling, this plan is appropriate when annuai 
estimates, and to a lesser extent semiannual or quarterly estimates, are of primary 
importance, for example, in an illness survey with emphasis on chronic diseases. If 
the questionnaire obtains for any unit the results for the preceding month as well 
as for the current month, we can consider composite estimates of the form 


Vn = In + bn (Vr—-1 Fren) (12.91) 
where ï, = estimate made from current data in the current sample 


Yn-1,n = estimate made from previous month’s data in the current sample 
Yi-1 = composite estimate for the previous month 


The theory is discussed by Hansen, Hurwitz, and Madow (1953) and Woodruff 
(1959), who apply it toa survey of retail sales, and by Eckler (1955). In the Retail 
Trade Survey the composite estimate involves a ratio estimate, being of the form 


=a- wy +w), 
Yh-1.h 
where W is a weighting factor. Since month-to-month correlations are very high, 
averaging around 0.98, the gains in precision are substantial. One month later a 


revised composite estimate for month h is computed, using the results for month h 
from the new sample taken in month (h +1), 


With this method, it is essential that the da 
from the current sample be accurate. This m 
the unrecorded memory of the res 
successfully if the data are of a t 
routine matter. 


ta obtained for the preceding month 
ay not be so when the data depend on 
pondent, although the method may work 
ype that the respondent records carefully as a 


EXERCISES 


12.1 $3000 is allocated for a survey to estimate a proportion. The main survey will cost 
$10 per sampling unit. Information is available in files, at a cost of $0.25 per sampling unit. 
that enables the units to be classified into two strata of about equal sizes. If the true 
Proportion is 0.2 in stratum 1 and 0.8 in stratum 2, estimate the optimum n, n’, and the 
resulting value of V(p,,). Does double sampling produce a gain in precision over single 
Sampling? (The ratios n'/N, n/N, may be ignored.) 
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12.2 For the W,, P, in exercise’ 12.1, find the cost ratios c,/c,' for which double 
sampling is more economical than single sampling. 

12.3 A population contains L strata of equal size. If V,,, denotes the variance of the 
mean of a simple random sample and V,» V4, are the corresponding variances for stratified 
random sampling with proportional allocation and for double sampling with stratification, 
show that, approximately, 


E -YY 
Van =S +~ 
nv,, = Sy 
L(¥,- ¥? 
nVa = 5,2+— h 


where S,? is the average variance within strata. (N and n’ may both be assumed large 
relative to L, and the n, in double sampling may be assumed equal to n/L.) ‘ 

Hence, if (RP),, denotes the relative precision of the stratified sample to the simple 
random sample, with a corresponding definition for (RP),,, show that 


(RP), 


(RP) = TF Gn RP), 


For (RP),,=2, plot (RP),, against n/n’. How small must this ratio be in order that 
(RP), = 1.97 


12.4 Tf p =0.8 in double sampling for regression, how large must n’ be relative to n, if 
: thes ms in precision due to sampling errors in the mean of the large sample is to be less than 
10%. ` 


12.5 Inan application of double sampling for regression, the small sample was of size 
87 and the large sample of size 300, The following computations apply to the smal! sample. 


2X (1-9)? = 17,283, Ey-y; -3)=5114, F (x, —£)? = 3248 


Compute the standard error of the regression estimate of Y. 
12.6 For p =0.95, verify the data given in Table 12.4 for the optimum percentage that 
should be matched and for the gain in precision relative to no matching. Compute the 


corresponding percent gains in precision if one third of the units are retained from the first 
to the second occasion and one half of the units are retained on each subsequent occasion. 


12.7 In simple random sampling on two occasions, suppose that the estimate on the 
second occasion is, in the notation of section 12.11, 
¥2"=(1-O)(F, + Yam Vim) + OF 2u 

(a) Ignoring the fpc, show that 


<n $ 1+ = 2 
vonn="{a -oy Etut- #3) 
n A m 
where A= m/n, p =u/n. (b) For given p, A, u, find the value of ¢ that minimizes V(j."), 
Show that if p exceeds 3 the best weight ¢ lies between p and u/(1 +y). 


DOUBLE SAMPLING 357 


12.8 For y =}, #=3, p =0.8, and p=0.9, compare V(¥f,”) in the preceding exercise 
with the variance of the optimum composite regression estimate J,’, as given by equation 
12.74. (In j,” take @ = 0.2 when » =4 and ¢ = 0.4 when u =3.) Verify that for these values 
of p the estimate ¥," is almost as precise as f,’ for both « =4 and p =}. 

12.9 An independent sample of size n is drawn each month. From the sample taken in 
any month, data are obtained for the current and the preceding month. A composite 
estimate jy,’ is made as in (12.91), section 12.13. 


Fn =n + Dn (Fr-1- Fain) 
The model is 
Yni = Yn + P(Yn—14 — Yani) + Eni 
where e,, is independent of the y’s and has variance (1-— p°). Show that (a) 
vie Y, ANE ¥,-1)+(0- Oi )(Fr—arn = Y,-1) 

(b) If V(¥,') = g,S?/n, where S? is constant on all occasions, 

En =(1=p7) + by, 8r-1 +- Ah) 
(c) The optimum ¢, =p/(1+g,-ı) and the resulting optimum g, is 


2 


rP. 
1+g,- 


g= 


(d) The limiting g, iS ge = V1- p°. These results were given by Eckler (1955). 


12.10 If E, = V(¥)/V(§,,) and Ep = V(¥)/V(Fp) are the relative efficiencies of the 
linear regression and ratio estimates of the sample mean of a simple random sample, show 


that for both y, and Fg the corresponding relative efficiency to y in double sampling with 
optimum choice of n/n’ is (ignore 1/N), 


en-e/(1+ EVE) 


Hence note that with either of these estimators, double sampling will not be highly effective 
unless c’/c is small (e.g., < 1/10). For example, with c’/c = 1/10, E=6 gives Ea =2.1. 
12.11 In sampling on two occasions, suppose that S, = S,= S and that the samples are 


large, so that the regression coefficients of Yz ON y,, and of y,, on yz in the matched part of 


the samples on the two occasions are both effectively equal to p. The estimate y,' in section 
12.10 is constructed and an analogous e 


Sh stimate J,’ using the regression of Yi; ON y2;. Show 
a! 
2 

© vigs- nee 

(¥2'-5,') (Giza) 
i 2S?(1+p) 
(ii) VF +9,') =——— 

Gz +y’) (n+up) 


(One way of doing this is to express CAESA 
(Fau + Jau), Which are uncorrelated). 
Note that, as intuition suggests, (i) is minimized when u = 0, while (ìi) is minimized when 
u=n. 

12.12 The most favorable case for the application of the method in exercise 12.1 occurs 
when the true proportion is 0 in stratum 1. In estimating the total number of units Y, inthe 


') as linear functions of (Fam Fm) and 
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population that possess an attribute costly to measure, this happens when there is a second 
attribute, cheap to measure, such that only units having the second attribute can possess the 


first attribute. In a simple random ‘sample of size n', count the number m’ who have 
attribute 2. Draw a subsample of size vm’ = 


= m'/k from these, and count the number r who 
have attribute 1. F 
(a) From theorems 12.1 and 12.2, show that Yi 


r = Nkr/n is an unbiased estimate of Y, 
with variance [ 


«,_N?P n'\ P(k-1) 
vy, -X2 hı -5) A) 
TAN N? P,+P, 
where P,, (P, +P,) are the population Proportions having attributes 1,2. Assume 1/N(P,+ 
P2) negligible. 
(b) With c=10, c’=1, the inve: 


Stigator guesses that P,=0.25, P,=0.15. Is 
sampling profitable in this case? E j 2 o Is double 


CHAPTER 3 


Sources of Error in Surveys 


13.1 INTRODUCTION 


The theory presented in preceding chapters assumes throughout that some kind 
of probability sampling is used and that the observation y, on the ith unit is the 
correct value for that unit. The error of estimate arises solely from the random 
sampling variation that is present when n of the units are measured instead of the 
complete population of N units. 

These assumptions hold reasonably well in the simpler types of surveys in which 
the measuring devices are accurate and the quality of work is high. In complex 
surveys, particularly when difficult problems of measurement are involved, the 


assumptions may be far from true. Three additional sources of error that may be 
present are as follows. 


1. Failure to measure some of the units in the chosen sample. This may occur by 
oversight or, with human populations, because of failure to locate some individu- 
als or their refusal to answer the questions when located. 


2. Errors of measurement on a unit. The measuring device may be biased or 
imprecise. With human populations the respondents may not possess accurate 
information or they may give biased answers. 

3. Errors introduced in editing, coding and tabulating the results. 


These sources of error necessitate a modification of the standard tneory of 
sampling. The principal aims of such a modification are to provide guidance about 
the allocation of resources between the reduction of random sampling errors and 
the reduction of the other errors and to develop methods for computing standard 
errors and confidence limits that remain valid when the other errors are present. 


13.2 EFFECTS OF NONRESPONSE 


We will use the term nonresponse to refer to the failure to measure some of the 
units in the selected sample. In the study of nonresponse it is convienient to think 
of the population as divided into two “strata”, the first consisting of all units for 
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` which measurements would be obtained if the units happened to fall in the sample, 
the second of the units for which no measurements would be obtained. The 
compositions of the two strata depend intimately on the methods used to find the 
units and obtain the data. A survey in which at least three calls are made, if 
necessary, on every house and in which a supervisor with exceptional powers of 
persuasion calls on all persons who refuse to give data will have a much smaller 
“nonresponse” stratum than one in which only a single attempt is made for every 
house. 


i TABLE 13.1 


RESPONSES TO THREE REQUESTS IN A MAILED INQUIRY 


Average Number 
Number of % of of Fruit Trees 
Growers Population per Grower 


Response to first mailing 300 10 456 
Response to second mailing 543 17 382 
„Response to third mailing 434 14 340 
Nonrespondents after 3 mailings 1839 39 290 

Total Population 3116 100 329 


This division into two distinct Strata is, of course, an oversimplification, Chance 
Plays a part in determining whether a unit is found and measured in a given 
number of attempts. In a more complete specification of the problem we would 
attach to each unit a Probability representing the chance that it would be 
measured by a given field method if it fell in the sample. 

The sample provides no information about th s se str 
would not matter if it could be assumed that the charac 


ave been made, however, it has often 
Stratum differ from units that are 
measurable. An illustration appears in Table 13.1, The data come from an 
North Carolina in 1946. Three 
ere sent to growers. For one of the 
ata were available for the popula- 


The steady decline in the number of fruit 
responses is evident, these numbers being 456 
382 in the second mailing, 340 in the third, a 
letters. The total response was poor, more tha 
data even after three attempts. 


trees per grower in the successive 
for respondents to the first mailing, 
nd 290 for the refusals to all three 
n half the population failing to give 
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We now consider the effects of nonresponse on the sample estimate. Let N,, N2 
be the numbers of units in the two strata and let W,=N,/N, W2=N>/N, so that 
W, is the proportion of nonresponse in the population. Assume that a simple 
random sample is drawn from the population. When the field work is completed, 
we have data for a simple random sample from stratum 1 but no data from stratum 
2. Hence the amount of bias in the sample mean is 


E(j,)-Y=Y,-Y= Y,—(W,¥,+ W2Y2) ý à 
l = W,(¥,~ ¥2) (13.1) 


The amount of bias is the product of the proportion of nonresponse and the 
difference between the means in the two strata. Since the sample provides no 
information about Y>, the size of the bias is unknown unless bounds can be placed 
on F, from some source other than the sample data. With a continuous variate, 
the only bounds that can be assigned with certainty are often so wide as to be 
useless. 

Consequently, with continuous data, any sizable proportion of nonresponse 
usually makes it impossible to assign useful confidence limits to Y from the sample 
results. We are left in the position of relying on some guess about the size of the 
bias, without data to substantiate the guess. 

In sampling for proportions the situation is a little easier, since the unknown 
proportion P; in stratum 2 must lie between 0 and 1. If W3 is known, these bounds 
for P, enable us to construct confidence limits for the population proportion P. 
Suppose that a simple random sample of n units is drawn and that measurements 
are obtained for n, of the units in the sample. Assuming n, large enough, 95% 
confidence limits for P, are given by 


Pr=2Vpiqi/ny 


where p; is the sample proportion and the fpc is ignored. 
When we try to derive a confidence statement about P, we are on safe ground if 


we assume P,=0 when finding Ê, and P, =1 when finding Py. Thus we might 
take, for 95% limits, i 


Ê, = Wi(p:—2Vpigi/ny) + W210) (13.2) 
Pu = Wi(p, +2Vpiqi/m) + Wa(1) (13.3) 
It is easy to verify that these limits are conservative, that is, that 
Pr(P, = P< Py) >0.95 


The limits can be narrowed a little by a more careful argument (Cochran, 
Mosteller, and Tukey, 1954, p. 280), since P, cannot be 0 and 1 simultaneously, as 
assumed above. 
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The limits are distressingly wide unless W; is very small. Table 13.2 shows the 
average limits for a sample size n = 1000 anda series of values of W, and p,. Since 
the limits in (13.2) and (13.3) depend on the value of n, (number of.respondents in 
the sample), we have taken n, =nW,, its average value, in computing Table 13.2. 


TABLE 13.2 
95% CONFIDENCE Limits FoR P (°) WHEN 2 = 1000 
% 
Nonresponse, Sample Percentage, 100p, 
100W, 5 10 20 50 

0 (3.6, 6.4) (8.1, 11.9) (17.5, 22.5) (46.7, 53.2) 
5 (3.4, 11.1) (7.6, 16.3) (16.5, 26.5) (44.4, 55.6) 
10 (3.2, 15.8) (7.2, 20.8) (15.6, 30.4) (42.0, 58.0) 
15 (3.0, 20.5) (6.8, 25.2) (14.7, 34.3) (39.6, 60.4) 
20 (2.8, 25.2) (6.3, 29.7) (13.7, 38,3) (37.2, 62.8) 


The rapid increase in the width of the confidence interval with increasing W, is 
evident. It is of interest to examine what values of n would be needed to give the * 
same widths of confidence interval if W, were zero. This is easily done when py is 
50%. For W,=5%, Table 13.2 shows that the half-width of the confidence 
interval is 5.6. The equivalent sample size n,, assuming no nonresponse, is found 


from the equation 
5.6= 2V(50)(50)/n, 
ne = 320 


the values of n, are 155, 90, and 60, respectively. It 
ote a substantial proportion of the resources to the 


ts would have given a Positive response. 


Jo: § =10%, so that 80 sample 
members give a positive response and the sample nonresponse rate is 20%, Then. 


Ê, =8-2V(8)(92)/1000 = 6.3% 
Py, = 28+2V(28)(72)/1000 = 30.8% 


The limits are a little wider than those for Pi=10%, W,=20% in Table 13:2: 
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If W, is known from previous experience in the particular type of survey, 
Birnbaum and Sirken (1950a, 1950b) give a method of finding the sample size n 
that guarantees with risk œ an absolute error in the sample proportion less than a 
specified amount d. No advance knowledge of P;,P2, or P is assumed. If there 
were no nonresponse, we would take (by section 4.4), 


n =t, PQ/d? : (13.4) 


where t, is the normal deviate corresponding to the risk that the error exceeds d. 
With no advance information about P, the least favorable case is P= 0.5, giving 


n=: (13.5) 


By taking the least favorable combination of the bias W2(P, — P2) and the value 
of P,, Birnbaum and Sirken show that a value of n that still guarantees an error 
less than d, with risk a, is approximately 


ts: 


"= aa- WaW: 


1 (13.6) 


Note that no value of n suffices if W2 > d. If W2 = 0, this equation reduces to (13.5) 
apart from the term — 1, which comes from an approximation in the analysis. Some 
values of n given by Birnbaum and Sirken’s method are shown in Table 13.3. 

This table tells the same sad story as Table 13.2. If we are content with a crude 
estimate (d = 20), amounts of nonresponse up to 10% can be handled by doubling 
the. sample size. However, any sizable percentage of nonresponse makes it 
impossible or very costly to attain a highly guaranteed precision by increasing the 
sample size among the respondents. 


TABLE 13.3 
SMALLEST VALUE OF 1 FOR GIVEN Limit OF ERROR d, WITH Risk « = 0.05 

o 
% 

Nonresponse, d(%) 

100W; 20 15 10 
0 24 43 96 384 
2 27 50 122 653 
4 31 60 166 2000 


364 SAMPLING TECHNIQUES 
13.3 TYPES OF NONRESPONSE 


Some methods for handling the nonresponse problem are described in succeed- 
ing sections. A rough classification of the types of nonresponse is as follows. 


1. Noncoverage. This is failure to locate or to visit some units in the sample. 
This is a problem with areal sampling units, in which the interviewer must find 


2. Not-at-homes. This group contains persons who reside at home but are 
temporarily away from the house. Families in which both parents work and 
families without children are harder to reach than families with very young 
children or with old people confined to the house. 

3. Unable to answer. The respondent may not have the information wanted in 
certain questions or may be unwilling to give it. Skillful wording and pretesting of 
the questionnaire are a safeguard. 

4. The “hard core.” Persons who adamantly refuse to be interviewed, who 
are incapacitated, or who are far from home during the whole time available for 
field work constitute this sector. It represents a source of bias that persists no 
matter how much effort is put into completeness of returns. 


The detection and measurement of noncoverage are difficult. With areal 
sampling, one method is to revisit the primary units, making a careful listi 


» the problem is easier in surveys in which any 


er lists on the schedule the eligible persons in the household and 


then numbers them: males first in order of decreasing age, then females in order of 
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decreasing age. Each schedule has printed on it one of the sets of instructions in 
Table 13.4. 

Each eligible person in a household of a given size has an equal chance of being 
sclected. except that adults 3 and 5 in households of size 5 are slightly overrep- 
resented. Since .male respondents are concentrated in Tables A, B, and C. the 
interviewer can devote evening calls to households so designated. 


TABLE 13.4 
INSTRUCTIONS FOR SELECTING A SINGLE RESPONDENT 


If Number of Adults in 
Relative Household is 
Frequency peur ĵ 
oe Uae | Number | 1] 2 f3 | apiti Na 
Select Adult Numbered 
E Eie pans 
1/6 A 1 1 1 1 1 1 
1/12 BI 1 1 1 1 2 2 
1/12 B2 1 1 1 2 2 2 
1/6 C 1 1 2 2 3 3 
1/6 D Ni ABEP. VAN tS E aaa 
1/12 El 1 2 3 3 3 5 
1/12 E2 1 2 3 4 5 5 
1/6 F aie 1 ipigi gsang 


13.4 CALL-BACKS 


A standard technique is to specify the number of call-backs, or a minimum 
number, that must be made on any unit before abandoning it as “unable to 
contact.” Stephan and McCarthy (1958) give data from a number of surveys on 


the percentage of the total sample obtained at each call. Average results are 
shown in Table 13.5, 


TABLE 13.5 2 


MBER OF CALLS REQUIRED FOR COMPLETED INTERVIEWS 
% of Sample contacted on 
First Second Third or Later Per Cent 


Nu 


Respondent Call Call Call Nonresponse Total 
Any adult* 70 17 8 5 100 
Random adult 37 32 23 8 100 


*Two surveys in which the respondent was a housewife anda farmo 


perator, respectively, 
have been included in the “any adult” group. 


366 SAMPLING TECHNIQUES 


In surveys in which any adult in the house could answer the questions, the first 
call obtained about 70% of the sample and the first two calls, 87%. The increased 
cost of sampling when a randomly chosen adult is to be interviewed is evident, the 
first call producing only 37% of the required interviews. The marked success of 
the second call reflects the work of the interviewer in finding out in advance when 
the desired respondent would be at home and available. 

Little has been published on the relative costs of later calls to the first call. Later 
calls would be expected to be more expensive per completed interview, since the 
houses are more sparsely located in the area assigned to the interviewer and since 
the occupants are presumably people who spend more than an average amount of 
time away from home. From British experience, Durbin (1954) suggests that later 
calls may be less expensive than would be anticipated. The following figures 
(Table 13.6) show estimated relative costs per completed interview (i.e., money 


spent on ith calls divided by number of new interviews obtained) for each call up 


to the fifth in a special Study reported by Durbin and Stuart (1954). 


TABLE 13.6 
RELATIVE Costs PER New COMPLETED INTERVIEW AT THE iTH CALL 
Call 1 2 3 4 5 
Relative cost 100 112 127 151 250 


only for the first assumption, the 


method being exactly the same for the second. The symbol n, denotes the original 


sample size. 

Insistence on up to three calls costs only 4% more per completed interview than 
single calls if any adult is a satisfactory respondent, and only 10% more if a 
random adult must be interviewed. How typical these rèsults are is not known, but 
the method provides realistic estimates of the cost of insisting on call-backs if the 
necessary cost and sample size data have been collected. There is also the time 
factor: call-backs delay the final results. 
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TABLE 13.7 
RELATIVE COSTS PER COMPLETED INTERVIEW UP TO THE ith CALL 
Respondent = Any Adult 


$ _ “Random” 
At ith Call Up to ith Call Adult | 
No. Cost Total Cost No. Cost 
Call Relative of of No. of Total per of per 
Cost Ints.* Ints. Ints. Cost Int. Ints. lnt. 


100 0.70m, 70m, 0.70n, TOn, 100 0.37m, 100 
112 0.177, 19.04n, 0.87m, 89.04m, 102 0.32m, 106 
127 0.07n, 8.89m, 0.94n, 9793n, 104 0.167, 110 
151 0.04m, 6.04no 0.981, 103.97n, 106 0.09m, 114 
250 0.02m, 5.00m, 1.007, 108.97n, 109 0.06m, 122 


Uhun- 


* Interviews 


13.5 A MATHEMATICAL MODEL OF THE EFFECTS 
\ OF CALL-BACKS 


Deming (1953) developed a useful and flexible mathematical model for exa- 
mining in more detail the consequences of different call-back policies. The 


population is divided into r classes, according to the probability that the respon- 
dent will be found at home. Let 


w; = probability that a respondent in the jth class will be reached on or before the 
ith call 

Pj = proportion of the population falling in the jth class 

Pike item'mean for the jth class 

oj =item variance for the jth class 


For simplicity we assume wy >0 for all classes, although the method is 


easily adapted to include persons impossible to reach. If yy is the mean for those 
in class j who were reached on or before the ith call, it is also assumed that 
E(ĵ;)= hj. 

The true population mean for the item is 


A =X piy (13.7) 


Consider the composition of the sample after i calls. The persons in the sample can 
be classified into (r + 1) classes as follows: in the first class and interviewed: in the 
second class and interviewed; and so on. The (r + 1)th class consists of all those not 
yet interviewed after i calls. If the fpc is ignored, the numbers falling in these 
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(r+ 1) classes are distributed according to the multinomial 
[wapi + Wapi +: -+ wip, +(1—-Y wyp,)]"* 


where n, is the initial size of the sample : 

It follows that the number n; who have been interviewed in the course of i calls 
is binomially distributed with number of trials =n, and probability of success 
È wyp;- Hence 


E(n;) = expected number of interviews in i calls = n, X Wy Dj (13.8) 
i 


For fixed n; the numbers of interviews n; obtained (j= 1,2,...,r) follow a 
multinomial with probabilities wypi/} wip; It follows that 


E(n;ln:)= iWipj: 
2 WijPj 
Hence, if ¥, is the sample mean obtained after i calls, 


Ey; | n)= E(= zi) Ri Èn WiPiktj _ 2 WijPibyj _ a (13.9) 
A ni nd Wii È Wi Pj 

Since this result does not depend on n; the unconditional mean of J; is also ġ;. The 

bias in the estimate y is therefore (a; — Ë). 


The conditional variance of Ji for given n; is found similarly to be 


x w;pilo’ +(uj— Ā)] 
Vln) =t = (13.10) 
ni z WyPj 


The unconditional variance, ignoring terms of order 1 /n?, is 
by replacing 7; in (13.10) by its expected value form (13.8). 
Finally, the mean square error of the estimate obtained after i calls is 


MSE()jli) = Vli) + (a; — 2)? 
The cost of making i calls must also be considered, 


interviews obtained in the kth call is L(y -—w 


completed interview at the kth call, the exp 
n,C(i), where 


given approximately 


(13.11) 


The expected number of new 
k-1,;)P;- Hence, if Cx is the cost pet 
ected total cost of making i calls is 


Cli)=c D wipte} (Wo; — W4;)p; +: OC (Wi — Wi-a,) Pj 


_ Example. A population with thr 
intended to re in whi 


probabilities 
At the secon 


SOURCES OF ERROR IN SURVEYS 369 


missed previously are 0.9, 0.5, and 0.2. These figures were made higher than the 
corresponding probabilities at the first call in order-to represent the effect of intelligent 
inquiry by the interviewer. 


TABLE 13.8 
CHARACTERISTICS OF THE THREE CLASSES 
Class 
1 2 3 
Ps 0.45 0.50 0.05 
Wij 0.6 + (0.4)[1 — (0.1) 1} 10.3 + (0.7)[1 —(0.5)f °] [0.1 + (0.9){1 —(0.8)'"] 
Ip; 55 50 45 
Uy, 60 50 46 
TABLE 13.9 
NUMBER OF INTERVIEWS, COSTS PER INTERVIEW AND BIASES 
Number of Average 
Number of Calls Interviews Cost per 1 Il 
Required Obtained Interview Bias Bias 
1 0.4257n, 100 +1.118 +2.235 
2 0.771 n, 105 +0,711 +1.421 
3 0.8821, 108 +0.421 +0.842 
4 0.9337, 110 +0.266 +0.532 
5 0.9607, 114 *+0.180  +0.360 


The item being estimated is a binomial percentage close to 50%. Two sets of pj are 
considered (1, II). For simplicity, the within-class variances of = p; (100 — p) were all taken 
as 2500. The relative costs per completed interview at successive calls were those given in 
Table 13.6. 

Table 13.9 shows (a) expected total number of interviews obtained for a total of i calls, 
(b) the average cost of these calls per interview, and (c) the bias (Æ, — 2) in the estimate ï 
under assumptions I and II about the Hj 

In II, for example, the true population mean gi is 54%. The mean #2, obtained from first 
calls is 56.235%, giving the bias of +2.235% shown in the table. A policy that requires 
thrce calls reduces this bias to +0.842%. 

The values of MSE(¥) obtained from a given expenditure of money were compared for 
the different call-back policies. In the first comparisons the amount of money is sufficient to 
take n, = 500 if only one call is made. From Table 13.9 the expected number of interviews 
obtained in the first call is E(n,) = (S00)(0.425) = 212.5. If two calls are made, this expected 
number must be reduced to E(n,) = 212.5/1.05 = 202.4. to maintain the same cost, and 
similarly for 3, 4, and 5 call-backs. These values of E(n,) were substituted in equation 
(13.10) to give V(¥) and hence MSE(¥). 
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TABLE 13.10 


VALUES OF MSE(%) For DIFFERENT CALL-BACK POLICIES 
COSTING THE SAME AMOUNT 


"o = 500 
(for first calls only) n, = 1000 "n = 2000 
Number of 
Calls No 

Required Bias I+ Tis I Il l Il 
1 11.8 13.0 - 16.9 7.1 10.9 4.2 8.0 

2 12.4 12.9 14.6 6.7 8.3 3.6 5.2 

3 12.7 12.9 13.6 6.5 7.1 3.4 3.9 

4 13.0 13.1 13.4 6.6 6.9 gs 3.6 

5 13.5 13.5 13.8 6.8 6.9 3.4 35. 


* These represent Populations with smaller (I) and greater (II) amounts of 
bias, as defined in Table 13.8. 


Table 13.10 presents the resulting MSE’s for three amounts of expenditure, 


Se ponding to n, =500, 1000, 2000 for a single call. When n, = 500, the values of 
MSE(j) are also given for the “no bias” situation in which every uw, = 50. This column 
shows. the effect of call-backs when they are unnecessary, since no bias results from 
confining the survey to a single call. 


The policies giving the lowest MSE’s are shown in boldface type. Consider first the 
smallest sample size, n, = 500. If call-backs are unnecessary, a policy demanding as many 
as four call-backs results in only a modest increase in the MSE. In I, involving the smaller 

_ ämount of bias, the different policies produce about the same accuracy, although three is 


the optimum. In II, three to five call-backs are satisfactory, a single call giving a MSE about 
25% above the mi’ :'mum. 


For the larger sample sizes the optimum number of call-backs increases to four or five, 
and the use of a single call results in more substantial losses of accuracy, 


This is, of course, only an illustration. The importance of the method is that as 
information accumulates about costs and relative biases an economical policy can 
be worked out for any specific type of survey. 


13.6 OPTIMUM SAMPLING FRACTION AMONG THE 
NONRESPONDENTS 


After the first attempt to reach the Persons in the sample has been made, 
another. approach, due to Hansen and Hurwitz (1946), is to take a random 
subsample of the persons who have not been reached and make a major effort to 
interview everyone in the subsample. This technique was first developed for 
surveys in which the initial attempt was made by mail, a subsample of persons who 
did not return the completed questionnaire being approached by the more 
expensive method of a personal interview. 
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This method can be regarded as an application of the technique of double 
sampling for stratification presented in section 12.2 We first take a simple random 
sample of n’ units. Let 1,’ be the number of units in the sample that provide the 
data sought and n,’ the number in the nonresponse group. By intensive efforts, the 
data are later obtained from a random subsample n3 = vn’ out of the ns’. Hansen 
and Hurwitz use the notation v2 = 1/k, so that n =n3'/k. 

In the framework of section 12.2 there are two strata. Stratum 1 consists of 
those who would respond to a first attempt, with a measured sample of size 
n,=n,', so that v, = 1. Stratum 2 consists of those who would respond to the 
second attempt, with ny = n,'/k. 

The cost of taking the sample is 


Cony 


k 


Con’ teni + (13.12) 
where the c’s are the costs per unit: cy is the cost of making the first attempt, c, is 
the cost of processing the results from the first attempt, and c3 is the cost of getting 
and processing the data in the second stratum. If W,, W, are the population 
Proportions in the two strata, the expected cost is 


` 3Wən' 
C= cgn'+e, Wyn! +2 (13.13) 
As an estimate of Y, we take 
A “4 A niyi +n2'f> 
ys wiyat waya = PiN Ve) (13.14) 


n 


where ¥,, f, are the means of the samples of sizes n, = ny' and ny =ny'/k. By 
theorem 12.1, the estimate jy’ will be unbiased if responses are obtained from all 
the selected random subsample of size n, =n,'/k. 

By formula (12.3), the variance of vis 


Le a 1 1 2 WS. z 1 
Vi =(4-4) 2 292 ( 
g9 n! N. > 1 


7 


n va 
= 2-1) 2, (k=1) WS: 
(4 EAT a (13.15) 


The quantities n’ and k are then chosen to minimize the product C(V+S? 
From (13.15) and (13.13) we have p ay 
2 (S°= WS?) kW, 
V+S?/N =>— 2921, BS 
/ = = (13.16) 


c>Wyn' 


C= (cyte, W,)n'+ (13.17) 
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By Cauchy’s inequality, the optimum k is 


= [ex(S?- W2S,") 
kopi Scot ey W,) Mas) 


The initial sample size n’ may be chosen either to minimize C for specified V or 
V for specified C by solving for n’ from (13.16) or (13.17). If V is specified, 


pi _NIS?+(k =1) W227). 
ka (NV+S°) 


where V is the value specified for the variance of the estimated population mean. 

The solutions require a knowledge of W3: this can often be estimated from 
previous experience. In addition to $°, whose value must be estimated in advance 
in any “sample size” problem, the solutions also involve S,’, the variance in the 
nonresponse stratum. The value of $,” may be harder to predict; it will probably 
not be the same as $°. For instance, in surveys made by mail of most kinds of 
economic enterprise, the respondents tend to be larger operators, with larger 
between-unit variances than the nonrespondents. 

If W3 is not well known, a satisfactory approximation is to work out the value of 
Nbp from a provisional (13.18) and (13.19) for a range of assumed values of W, 
between 0 and a safe upper limit. The maximum nj,,, in this series is adopted as the 
initial sample size n’. When the replies to the mail survey have been received, the 
value of nz’ is known. In seeking the value of k to be used with this method, we use 
the variance v,(¥’) conditional on the known values of n, and n’. This can be 
obtained from (12.4) and 12.7) as 


(13.19) 


1 1 (k —1)n S? 
-A eS (13.20) 


n 


vag)= s? : 
n 
Equation (13.20) is solved to find the k that gives the desired conditional 
variance. The cost for this method is usually only slightly higher than the optimum 
cost for known W3. 


Example. This example is condensed from the paper by Hansen and Hurwitz (1946). 
The first sample is taken by mail and the response rate W, is expected to be 50%. The 
precision desired is that which would be given by a simple random sample of size 1000 if 
there were no nonresponse. The cost of mailing a questionnaire is 10 cents, and the cost of 
er pak the completed questionnaire is 40 cents. To carry out a personal interview costs 

4.10. 

How many questionnaires should be sent out and what percentage of the nonrespon- 
dents should be interviewed? 

In terms of the cost function (13.12) the unit costs in dollars are as follows. 

Co = cost of first attempt =01 

c, = cost of processing data for a respondent = 0.4 

c= cost of obtaining and processing data 

for a nonrespondent =4.5 
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The optimum n’ and k can be found from (13.19) and (13.18). If the variances S* and S,? 
are assumed equal and N is assumed to be large, then 


Ags cat= Wa) f (4.5)(0.5) 
ot cote: W,  * 0.1+(0.4)(0.5) 


, _S T1+(k—-1)W2] 
nan ye 


=V7.5=2.739 


1000{1 + (1.739)(0.5)} 


= 1870 


Note that we have put S?/V = 1000, or V=S?/1000, since this is the variance that the 


sample mean would have if a sample of 1000 were taken and complete response were 
obtained. 


Consequentiy, 1870 questionnaires should be mailed, Of the 935 that are not returned, 
we interview a random subsample of 935/2.739, or 341. The cost is $2095. 


As Durbin (1954) has pointed out, subsamupling is unlikely to show a marked 
profit unless cz is large in relation to (cp +c, W,). The two quantities are compara- 
ble, since (co +c, W;) is the expected cost per unit of making the first attempt and 
processing the results and c, has the same meaning for the second attempt. From 
the equations it can be shown that the ratio of the cost of obtaining a prescribed V 
with k = 1 (no subsampling) to the minimum cost for optimum k is 


S?(co+cı W, +c W2) z Cote Wi +c Wa 
[V(S? WS )(co +c: Wi) +Ve2W 252" [VW,(c, +c, W1) +V c2 W2} 


if S? and S,” are approximately equal. If r is the ratio of cz to (co +c, Wj), the cost 
ratio becomes 


1+rW, 
(VW, +Vrw,) 


For instance, the cost ratio is 1.029 for W,=0.5 if r=4, 1.146 if r= 


10, and 1.228 if r= 16. If, however, S° is substantially greater than S, there is 
more to be gained from subsampling. 


With stratified sampling, the optimum values of the n,' and the k, in the 
individual strata are rather complex. A good approximation is to estimate first, by 
the methods in sections 5.5 and 5.9, the sample sizes n,,, that would be required in 
the strata if there were no nonresponse. Now, from (13.19), if W.=0, we have 


NS? 
“NVS (13.21) 


No 
Hence (13.19) can be written as 


Nop = nfi + 


k-1)W,S," 
ongs] (13.22) 


S? 
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This equation, applied separately to each stratum, gives an approximation to 
the optimum n,,’. The values of k, are found by applying (13.18) in each stratum. 

These techniques can be used with ratio or regression estimates. With the ratio 
estimate, the quantities S? and S3? are replaced by S}? and Soa, where d; = 
y; > Rx, With a regression estimate, S° becomes $*(1—p”) and S,? becomes 
S7(1—p”). 


13.7 ADJUSTMENTS FOR BIAS WITHOUT CALL-BACKS 


An ingenious method of diminishing the biases present in the results of the first 
call was suggested by Hartley (1946) and developed by Politz and Simmons (1949, 
1950) and Simmons (1954). Suppose that all calls are made during the evening on 
the six week-nights. The respondent is asked whether he was at home, at the time 
of the interview, on each of the five preceding weeknights. If the respondent states 
that he was at home t nights out of five, the ratio (t+ 1)/6 is taken as an estimate of 
the frequency m with which he is at home during interviewing hours. 

The results from the first call are sorted into six groups according to the value of 
t, (0, 1, 2, 3, 4, 5). In the tth group let n, be the number of interviews obtained and 
Jo the item mean. The Politz-Simmons estimate of the population mean yp is 


È 6ng/(t+1) 

iste (13.23) 
È 6n,/(t+1) 
1=0 


This approach recognizes that the first call results are unduly weighted with 
Persons who are at home most of the time. Since a person who is at home, on the 
average, a proportion 7 of the time has a relative chance 7 of appearing in the 
sample, his response should receive a weight 1/7. The quantity 6/(t+ 1) is used as 
an estimate of 1/7. Thus Fps is less biased than the sample mean j from the first 
call, but its variance is greater, because an unweighted mean is replaced by a 
weighted mean with estimated weights. 

In presenting the mean and variance of Fps, we use the notation of section 13.5. 
The population is divided into classes, people in the jth class being at home a 
fraction 7; of the time. Note that the tth group (i.e., persons at home t nights out of 
the preceding five) will contain persons from various classes. Let Njo 
number and the item mean for those in class j and group t. Then fps may be 
written as follows. 


«Ld 6na/(t+1)_ N x 
es = Senat p 29) (13.24) 


This is a ratio type of estimate. In large samples its mean is approximately 
E(N)/E(D). 


Jı be the ` 
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If n, is the initial size of sample (responses plus not-at-homes) and n; is the 
number from class j who are interviewed, the following assumptions are made. 


(i) A is a binomial estimate of p;7; 
o 


(ii) E(n;ln;) =n; yeas 


gt ot os 
n-p 70-7 
(iii) E(¥j.) = uj for any j and t 


Assumption (ii) is open to question. Without giving a detailed discussion, it 
assumes that people report correctly whether they were at home. 
For given j, using assumption (ii), 


s 6 sa/ 6r\a Sls, 3 
EL mS) Smi Emen 7m (-m)° (1525) 
=f- (1-7)] (13.26) 

Tj 
Hence f 

=- + Eln) C RE 6 

E(D)= X SA f1-G-m)J=n, ¥ p-a- (13.27) 
=i j Jar 


using assumption (i). Furthermore, since E(¥j,) = 4; for any j and 1, this gives the 
result 


Twm 


XY pill- (1- 7;)$] 
E(Yps) = fips = = (13.28) 
È pl=(1=m;)] 


Since the true mean & =Ù pju, some bias remains in rs- In a certain sense, this 
estimate has the same bias as Y¢, the sample mean given by the call-back method 
witha requirement that as many as six calls be made if necessary. In section 13.5, 
equation (13.9), it was shown that the call-back method, with a total of i calls, 
gives an unbiased estimate of @,=Y WiPikil L Wip Where wy is the probability 
that a person in class j who falls in.the sample will be interviewed. Now Wi; = 7y. If 


at subsequent calls the probability of finding at home a person not previously 
reached remains at 7;, then 


Wy = U=- ™)'] 


So that jips = jig. However, with the call-68 method the probability of an 
Interview at a later call may be greater than 7; as a result of information obtained 
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by the interviewer at the first or earlier calls. In this event the call-back method has 
less bias after six calls. 

The variance of fps is rather complicated. With the usual approximation for a 
ratio estimate, it may be expressed, following Deming (1953), as 


= RS UL g 
V (Fps) =e m7 PBiloy + (u;— Hips)" 


(ital) Ds (mp) (B; = AP - Āps)’} 
where 
U= 1-Y p(1 =m;)É 


a= -a-m 


5 6 12.2951 
B= [Sl pee tale | ‘= Sit 
PANE E 
Although this expression is difficult to appraise without applying it to specific 
populations, two comments can be made. If the 1; do not differ greatly, that is, if 
the bias from first calls is moderate, the dominating term is the first. 


1 
ue PBR: 


This expression tends to be 25 to 35% higher than the variance of the unweighted 
mean of the first calls. Also, V(fps) contains a term that does not decrease as no 
increases and becomes important in very large samples. 

To summarize, comparisons made on simulated populations by Deming (1953), 
Durbin (1954), and myself suggest that this method shows to best advantage, in 
relation to call-backs, when the biases from early calls are substantial and the 
sample is large. The reductions in MSE for the same outlay are small, however, 
unless call-backs cost substantially more than postulated here. The Politz- 
Simmons technique has the advantage of saving time. Errors and incompleteness 
in the values of f, not considered in the analysis, are a disadvantage. The method 
may also be applied, as suggested by Simmons (1954), in conjunction with several 
call-backs. 

Several other methods for mitigating the “not-at-home” bias have been 
proposed. Bartholomew’s (1961) applies to a survey with two calls. He supposes 
that, for those not at home on the first call, the interviewer, by careful inquiry, can 
make the probability of finding them on the second call approximately equal. If 
this is so, the nz persons interviewed at the second call are a random subsample of 
the (n,n) persons missed at the first call. Hence [n 7; + (no —n1)72]/”o is an 
unbiased estimate of the mean ofthe initial target sample. The method worked 
well on some British surveys to which Bartholomew applied it. In repeated 
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surveys Kish and Hess (1959a) suggest that nonresponses from recent surveys 
may serve as a replacement for nonresponses in a current survey. Wherever the 
bias from early calls shows a systematic pattern, as in Table 13.1, Hendricks 
(1949) has outlined extrapolation methods to estimate the average results that 
would be given by nonrespondents. 


13.8 A MATHEMATICAL MODEL FOR ERRORS OF 
MEASUREMENT 


Conceptually, we can imagine that a large number of independent repetitions 
of the measurement on the ith unit are possible. Let y,, be the value obtained in 
the ath repetition. Then 


Via = Mit Cia (13.29) 
where w; = correct value 


ĉia = error of measurement. 


The idea of a “correct value” requires a little discussion. With some items the 
concept is simple and concrete. For instance, in an inventory taken by sampling, 
the correct value may be the number of fan belts lying on a shelf at 12 noon ona 
specified day. In some cases the correct value can be defined operationally. A 
person’s correct diastolic blood pressure at a specified time might be defined as the 
value obtained when it is measured by a certain standard instrument under 
carefully prescribed conditions. We may realize, however, that our standard 
instrument is itself subject to errors of measurement, and we may expect that in 
course of time a more precise instrument will be developed. With other items, for 
instance, some aspect of an employee’s attitude toward his employer or of a 
person’s feelings of ability to cope with his day-to-day problems, nobody may 
claim to have a satisfactory method of measuring the “correct value.” Neverthe- 
less, the concept is useful even in such cases. 

Under repeated measurements of the same unit, the errors ĉia Will follow a 
frequency distribution. For the ith unit, let e,, have mean B; and variance o;7. The 
term £; represents a bias in the measurements. The magnitudes of Bi and g? will, 
of course, depend on the nature of the item being measured and on the measuring 
instrument. They may depend also on numerous other factors. With human 
Populations the prevailing economic and political climate and the amount and 
type of advance publicity received by the survey may influence the responses to 
the questionnaire. 

The next step is to consider how the errors of measurement change when we 
move from one unit to another. Various complications can arise. 

For the bias component £;, there may be a constant bias, say, E(6,) =, that 
affects all units in the population. There will also be a component (8;—8) that 
follows a frequency distribution over the population. This component may be 
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correlated with the correct value y;; for instance, the measuring device may 
consistently underestimate high values of 4; and overestimate low values. 

There may be a correlation between the values of e; on different units in the 
same sample. The simplest example is the “interviewer bias.” Dramatic differ- 
ences are sometimes found in the mean values of Yia Obtained by different 
interviewers who are sampling comparable parts of the same population (see 
Lienau, 1941, Mahalanobis, 1946, and Barr, 1957). 

A similar effect has appeared when samples of a growing crop are cut by 
different teams and when chemical or biological analyses are done in different 
laboratories. The human factor is not the only cause for correlations among units 
that are measured at about the same time. Many measuring processes are affected 
by the weather; some use raw materials whose quality varies from batch to batch. 
In estimating the current sale price of homes built some years ago, Hansen, 
Hurwitz, and Bershad (1961) point out that if some houses in the sample have 
been sold recently their prices establish a level that guides the interviewer and the 
householder in assigning values to houses that have not been sold for many years. 
In fact, the average price recorded for the sample may depend on the order in 
which the recently sold houses appear in the sample. 

In order to handle these intrasample correlations in their most general terms, a 
more complex model than that presented here is required. In particular, the 
notation for e;, and 6; would have to indicate that their values may depend on the 
other units présent in the sample However, the types of correlation that are 
believed to be most common in practice can be represented by the present model 
or by simple extensions of it. 

The components of the error of measurement are summarized in Table 13.11. 


We have noted further that values of Bi and dia on different units in the same 
sample may be correlated with one another, where d,, = eia — Bi- 


TABLE 13.11 
COMPONENTS OF THE ERROR OF MEASUREMENT ON THE ith UNIT 
Symbol Nature of Component 
B Constant bias over all units 
Bi —B Variable component of bias, which follows some frequency 


distribution with mean zero as i varies and may be 
correlated with the correct value Ki 
dia = ĉia — 8; Fluctuating component of error, which follows some 


frequency distribution with mean zero and variance 
o,? as « varies for fixed i 


i Pelir, i Hansen 
Models in general similar to the preceding have been developed by 
et al. (1951), Sukhatme and Seth (1952), Hansen, Hurwitz, and Bershad (1961), 
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and Hansen, Hurwitz, and Pritzker (1965). Fellegi (1964) has given a more 
extensive model that includes numerous cross correlations that may exist between 
different sources of measurement error in practical situations. 

In their 1961 and 1965 papers, Hansen et al. expressed our model in slightly 
different terminology, including adding a subscript G to all variables as a reminder 
that the errors and their sizes may depend.on the general (G) conditions of the 
survey. They express results in terms of the di, and of 


Bi = E(Yiali) = pi +P; (13.30) 


The quantity y;’ (which they denote by Yj, or P, with a proportion) is” 
conceptually the average obtained from many repetitions of the measuring 
process on unit i. Hence 


dia = bia — Bi = Yia — bi’ (13.31) 


They call dia the response deviation on unit i, whereas we have called it the 
fluctuating component of the measurement error. Thus 


Yia -H = dix + (m; —H')+(u'— u) (13.32) 


where pz’ is the population mean of the p;’. 
Averaging over the sample, we have 


Ja b= d, + (fh! ~w')+(u'—p) (13.33) 

This gives the formula 
MSE(Ja) = V(da) + V(a') + (u'— 2)? +2 cov (dy, jx’) (13.34) 
For the sample mean ,, the terms on the right are called, respectively, the 
response variance, the sampling variance, the square of the overall bias, and twice 
the covariance between the sample’s average response deviation’ and sampling 


error. Since E(d,,|i) =0 under our model, this covariance vanishes under repeti- 
tions of the measuring process on the same set of sample units in the same order. It 


„might not vanish under repetitions over different samples or different orders if the 


dia are affected by the other units in the sample. Although this term will be ignored 
here for simplicity, Fellegi (1964) has shown in a Canadian study that it may 
materially bias some methods of estimating components of the response variance 
V(da). 

Using Cornfield’s approach, as described in section 2.9, Koch (1973) has given a 
general decomposition of the MSE of the estimator in multivariate sample 
surveys, with applications to subclass means: 


13.9, EFFECTS OF CONSTANT BIAS 


Suppose that the measurements y; on all units are subject to a constant bias B 
whose magnitude is unknown. Then the mean F of a simple random sample is also 
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subject to bias £. In the estimated error variance, which we attach to the sample 
mean, the bias cancels out, since this estimate is derived from a sum of squares of 
terms (y; — ï). Consequently, the usual computation of confidence limits for Y 
from the sample data takes no account of the bias. The same results hold in 
stratified random sampling. 

The situation is essentially the same with regression and ratio estimates. 
Consider the regression estimate 


Jr =¥+b(X—-2) 
where both the y; and the x; may be subject to constant biases 8, and By 
respectively. Since the least squares estimate b remains unchanged and since the 


bias 8, cancels out of the term (X — z), it follows that Ji, is subject to a bias B,. It is 


easy to verify that the sample estimate of V(¥,,) contains no contribution due to 
the biases. 


With the ratio estimate 


Jr= X 


tai 


the bias is also B,, to a first approximation, since in large samples E(X/2) is 
approximately 1 even if the x; are subject to a constant bias. In large samples the 
sample estimate of variance 


N-n) X (y: - Rx)? 


v(fr)= ALEA (13.35) 


will be almost free from bias as an estimate of 


E(ïr- Y? 
that is, as an estimate of the variance about the biased mean F. 

To summarize, a constant bias passes undetected by the sample data. As we 
have seen (section 1.8), the 95% confidence probabilities are almost unaffected if 
the ratio of £, to the standard error of the estimated mean is less than 0.1 but, as 
the ratio increases beyond this value, the computation of confidence limits 
becomes misleading. Estimates of change from one time period to another, or 
from one stratum to another, remain unbiased, provided that the bias is constant 
throughout. 


13.10 EFFECTS OF ERRORS THAT ARE 
UNCORRELATED WITHIN THE SAMPLE 


In this section a few results about V(Ja) and v(a) are given for ep pn om 
sampling in the simplest situation in which errors of measurement are ee ; 
lated within the sample. This situation may apply in surveys taken from records, in 
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self-filled questionnaires (as in mail surveys) in which members of the same 
sample do not consult one another, and in some surveys of inanimate populations 
in which the measurement is objective. The assumption of no correlation cannot 
be made lightly: intrasample correlation may enter, for example, in interviewing, 
editing, coding, and transferring the data to the computer if the same person 
handles a number of elements in the sample. 

From (13.34) we get, for the variance, 


Vla) = V(da)+ VUE’) (13.36) 


In finding variances we average first over repetitions of the measuring process 
on a specific sample and then over different simple random samples. Now 
V(di.) =o; and errors are uncorrelated on different units within the sample. 
Hence, for simple random sampling, where S denotes a specific sample, 


u le 
E(d,’|S) =al (13.37) 
When we average subsequently over all simple random samples, we have 
N 
N n È (ui hy 
VG.)=-Ey of tt (13.38) 
“RNG o w N=1 j 
1 (LSP) ova 
Rt tT Se (13.39) 


where a,” denotes the population average of the. variances of the errors of 
measurement. In the Hansen et al. terminology, o47/n is the response variance of 
the sample estimate Ją. 

With uncorrelated errors the same model can be applied in estimating a 
Population proportion P (Hansen, Hurwitz, and Bershad, 1961). For any unit, let 
the correct value w; be 1 if the unit is in class C and zero otherwise. If errors of 
measurement occur, this implies that units are sometimes incorrectly classified. 
For the ith unit, the recorded value y; is sometimes 1, sometimes 0. Let P, denote 
the proportion of measurements on the ith unit for which Yia = 1. Then, for given i. 
Yia is a binomial variate in repeated measurement, with mean 4; =P, while the 
variance of di« is P;Q;. Hence, if pa = Ya is the sample estimate, (13.38) becomes 


1N -f) F (P- 
V (pa) = Vla = pl PrO + ED LA p? (13.40) 


1 N 
<a Èn E P+): P?—Np*) 


cows aint s 
n(N—1) Q (13.41) 
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where 


As (13.41) shows, the sum of the response variance and the sampling variance 
has an upper limit NPQ/n(N — 1). This upper limit is also the variance of the 
sample mean of the correct measurements on the units if the fpc is ignored and 


there is no overall bias. For the correct measurement p; on any unit is then a 
binomial variate with mean P, so that 


Si N 
O aes > 
RIND O 5V) S 


This rather puzzling result holds because (i) the sampling variance entering into 
(13.39) „and (13.40) is that of 4;'=(u;+;,) and (ii) in the estimation of a 
proportion, u; and 8; are always negatively correlated. When p; = 1, B; = 0, since 
P; =(u“;+B;)S1, and similarly when 4; = 0, 6; =0. Thus the term 
Sut 
5 n 
in (13,39) is always less than 


Su 


n 
by just about the response variance of Fa- 

With uncorrelated errors, a useful result is that the usual formula for v(¥q) in 
simple random sampling remains unbiased if the fpc is negligible. From theorem 
2.4 corollary, this formula, developed by assuming no errors of measurement, is 


Wj) = Fy? = AF wa Ya)” 


E ER (13.43) 
Now 
Yia — Ya = (dia ~ da) + (w:'+ 2’) (13.44) 


Squaring and averaging first over repeated measurements and then over 
repeated sample selections, we obtain à 
aya Lefn nile 
Evga) = =op s, (13.45) 


while, from (13.39) 


ad Ps 
1 


ha A 
y.)=—o,/+ 13. 
Via) Foie F (13.46) 


Thus Ev (ïa) = V(Fa) if f is negligible. 
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In the same way the formulas in preceding chapters for the sample estimates of 
sampling error variances can be shown to remain valid in stratified and multistage 
sampling, as do the large-sample formulas applicable to ratio and regression 
estimates, provided that errors of measurement in y,, and x; are uncorrelated 
within the sample and that the fpc’s are negligible. 


13.11 EFFECTS OF INTRASAMPLE CORRELATION 
BETWEEN ERRORS OF MEASUREMENT 


Suppose that some or all of the values of d,, for units in the same sample are 
correlated. The term d,” can be written 


d2=4y(5 a?+2F dade) (13.47) 
i i<j 
Hence, averaging over repeated measurements and simple random samples, 
= = Taahe Sel 
Vid.) = EEA =op EOD pld, da) (13.48) 


where the products are averaged over all pairs of units in the same sample. By 
analogy with cluster sampling, the average intrasample correlation coefficient p,, 
may be defined by the equation 

E(diu dja) = Pwou? (13.49) 
This gives, from (13.48), 


vd,)= [1 +(n—1)py] (13.50) 


Hansen, Hurwitz, and Bershad (1961) have called V(d,.) the total response 
varlance as it affects the sample mean. Its component o,°/7 is called the simple 


response variance, while the term (n~1)p,.0,2/n is the correlated component of 
the total response variance. 


From (13.34) we get, assuming coy (da &')=0, 


VOI) =U (Dp deals, (13.51) 
The average value of the usual v(¥.) in (13.43) is found in the same way to be 

Ev(Ja) -Pipa =Pw) F Sa] (13.52) 
Since p, is likely to be positive for many types of measurement error, the standard 


formula for v(¥,.) is usually an underestimate in this case and makes the sample 
estimate appear more precise than itis. This is true even when the fpcis negligible 
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Perhaps the most frequent example of intrasample correlation between errors 
of measurement is the intrainterviewer correlation previously mentioned, par- 
ticularly on questions involving opinions and judgment. Suppose that n = mk, and 
that each of the k interviewers obtains data from m respondents. If we can assume 
that there is no correlation between the errors of measurement of different 
interviewers, 


2 
Vd.) =" [1 +(m —1)pu] (13.53) 


where p,, is the average intrainterviewer correlation coefficient. 

Even a small p, may make a major contribution to V(j,); since it is multiplied by 
roughly the size m of the interviewer’s assignment. There may also be some 
correlation between errors of measurement for different interviewers (e.g., if they 
have been trained or directed by the same supervisor). 

This model represents only the simplest type of intrasample correlation. With 
stratified sampling, for instance, a coder may process results from several strata 
and through a misunderstanding of instructions may introduce correlated errors 
that extend over the strata. The mathematical model can be adapted to apply to 
situations of this type. 


13.12 SUMMARY OF THE EFFECTS OF ERRORS 
OF MEASUREMENT 


In terms of the model, the mean ¥ of a simple random sample would be 
unbiased, with variance S,,”/n (ignoring the fpc), if all measurements were fully 
accurate. As a result of the types of errors of measurement discussed here, the 
mean may be subject to a bias of amount £, and its mean square error is 


MSE(Ja) = S, 2+ a[l +n- Ipv} +8? (13.54) 


where w; = pi + Bi 

Formula 13.54 contains two terms, S/n and og'(1 —p,,)/n, that decrease as 
1/n. The remaining two terms, pc,’ and B”, appear at first sight to be indepen- 
dent of n. This is probably an oversimplification. Any material change in the size 
of sample may require a change in the field methods of measurement, and this may 
affect p„ and B°. However, these two terms should change relatively slowly, if at 
all, with n. Thus, in large samples, the MSE is likely to be dominated by these two 
terms, the ordinary sampling variance becoming unimportant and misleading as a 
guide to the real accuracy of the results. 


13.13 THE STUDY OF ERRORS OF MEASUREMENT 


In recent years much of the research on sampling practice has been devoted to 
the study of errors of measurement. The objectives are to discover the compo- 
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nents that make a large contribution to the MSE and to find ways of decreasing 
these contributions. Some of the principal methods are described in this and the 
following sections. It is already clear that progress will be slow and expensive. One 
reason is that, as already mentioned, the measurement errors depend intimately 
both on the items and on the measuring process. Results about measurement 
errors found in one survey can seldom be assumed to apply to other surveys. 

Ideally, the best method of studying errors of measurement is to obtain the 
correct values w. In practice, this approach is limited to items for which a feasible 
method of finding 4; exists and by problems of expense and execution. Examples 
are given by Belloc (1954), who compared data on hospitalization as reported in 
household interviews with the hospital records for the individual, and by Gray 
(1955), who compared employees’ statements of sick leave with the personnel 
office records. Checks of this type—sometimes called “record-checks’—are 
possible with items such-as age, occupation, number of years of schooling, and 
price paid for car. One difficulty is that sometimes the records contain no exact 
match of the person interviewed. 

Failing a method of determining the correct value, an alternative is to remeas- 
ure by an independent method that is considered more accurate. Kish and Lansing 
(1954) engaged professional appraisers to estimate the selling prices of homes that 
had already been reported by the home owners. In surveys of illness respondents’ 
replies have been compared either with doctors’ records on the respondents or 
with the results of a complete medical examination [Sagen, Dunham, and Sim- 
mons, 1959, Trussell and Elinson, 1959), The results of such comparisons are not 
easy to interpret in terms of the model, since the superior instrument is itself 
subject to measurement errors, but the comparisons will at least indicate the items 
for which the routine instrument agrees well with the superior instrument and 
those for which it does not. 

In household interview surveys, a method with a similar purpose is to reinter- 
view a subsample of the respondents by more expert interviewers, the question- 
naire being more detailed and probing. After the reinterview, the expert discusses 
with the respondent any discrepancies between the original and the repeat 
answers, the objectives being to determine the most accurate answer and the 
reason for the discrepancy. Much useful information may be gained. In pre- 
senting some devices for the measurement of response errors, Madow (1965) has 
discussed the use of double sampling, with a difference estimator of the form 
[¥’—(¥—-#)] as a means of reducing response bias. Here, É is the mean of 
unbiased or less biased measurements made on the subsample, wkil= y' and f are 
means of the original measurements on the sample and subsample. 

Occasionally, overall comparisons between the results of two different surveys 
are feasible. For a number of items, the results of the U.S. Census can be 
compared with those given by the Current Population Survey taken at the same 
time. Since the Survey is considered more accurate, particularly for items difficult 
to measure, rough estimates of the measurement bias £ in the Census data can be 
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made (Hansen, Hurwitz, and Bershad, 1961). A number of comparisons between 
the results of quota samples and probability samples are discussed by Stephan and 


McCarthy (1958). ) 
In the following sections we consider some methods designed to produce 


quantitative estimates of the components of the total variance (response variance 
plus sampling variance) of a sample estimate. 


13.14 REPEATED MEASUREMENT OF SUBSAMPLES 


In recent studies, interest has centered on estimating: (a) the total response 
variance and the relative sizes of the simple response variance and the correlated 
component as contributors to it, and (b) the relative sizes of the total response 
variance and the sampling variance. For (a) a common method is to select a 
subsample of the measuring agents (interviewers, coders, etc.) and remeasure 
their assignments by second agents presumably of the same skill, Suppose mk is 
the size of the subsample, m being the size of an agent’s assignment and k the 
number of agents chosen from the original sample. 


Let yj1, Yiz be the two measurements on the ith unit of an agent’s assignment. 
If (13.31) holds, 


Yia = dia + pi" (13.55) 


Hence, averaging over the assignment, 


ES (yi = yia)? 2 2 
i "+042" 
P EEEE (13.56) 


2m 


This expression estimates o4,7, the simple response variance for agent 1 in the 
survey, if two conditions (A) hold: no correlation between response errors di, dia 
on the same unit; and o,,7=04,". Equation (13.56) applies to a single pair of 
agents and is averaged over the subsample of k pairs of agents. 

Conditions A may hold when the measurement is coding, the agents being 
coders of similar skill trained by different Supervisors, neither coder seeing the 
other’s work. With interviewing, a positive cov (dj, dj2) is to be expected because 
some respondents repeat a first incorrect response from memory. In this event 
m 


È (via~yiz)?/2m underestimates 4,7. It also underestimates Sai” if the second 
agent is more skilled than the first, as shown for a (0,1) measurement by Hansen, 
Hurwitz, and Pritzker (1965). Moreover, © (y,;—y,>)*/2m has been found to 
decline if the second interviewer is given the responses obtained by the first 
interviewer, even if told not to look at them until the repeat interview is completed 
(Koons, 1973). These complexities illustrate why the realistic study of errors of 
measurement is difficult. 
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For the total response variance the relevant estimate from (13.55) for a single 
pair of agents is (fı — IDIL where Ÿ ;, ¥.2 are the means of the m first and second 
measurements. Under conditions (A). 

EU =I tm- Dp] (13.57) 


where p,, is the correlation between response errors on different units by the same 
agent. Equation (13.57) provides only a single degree of freedom, but is averaged 
over the k pairs of agents. Having an estimate of o4 from (13.56), we can estimate 
the relative sizes of the simple response variance and the correlated component. 
In interview surveys the correlated component is usually found much larger than 
the simple response variance except for basic items such as age, sex, and marital 
status (Fellegi, 1964). Partly for this reason the 1970 U.S. Census used self- 
enumeration by mail extensively (Hansen and Waksberg, 1970). 

If (13.55) holds, we can also study the ratio of the simple response variance to 
the sampling variance that applies to the sample in an agent’s assignment. For 
under conditions A, 


2m a 4 
EXY (Via Fa) /2(m — 1) = oy pu) +S," (13.58) 


Pritzker and. Hanson (1962) have called the ratio 


2 


Ta 


u 


the index of inconsistency. It is analogous to the quantity (1), where @ is the 
coefficient of reliability used in studying errors of measurement in psychology. If 
Pw is negligible, I can be estimated from the ratio of (13.56) to (13.58). 

With a (0, 1) variate, equations (13.40) and 13.41) in section 13.10, putting 
n=1, show that for a single measurement of a single unit, the sum of the simple 
response variance and the sampling variance is V(y,,)== PQ, where P =Y P/N, 


assuming f negligible. The set of joint values (y;,, y;2) obtained by the two agents 
can be summarized in the following 2 x 2 frequen 


cy table. 
Number of responses 
Second agent 
1 0 Total 
First 
agent 


Thus, 6 is the number of units on which the first agent records 1, the second 
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agent 0. Under conditions A 


Dy (Yia- Via) 
A = (b +e)/2m (13.59) 
As an estimate of J from the subsample, one choice is 


êr m(b+c) 


_ 9d 
PÔ (a+b)(c+d)+(a+cy(b+d) 


-which averages the PQ estimates from the two sets of measurements. Estimates of 
the index have been published for both Census and sample survey items by 
Pritzker and Hanson (1962), Fellegi (1964), and Koons (1973). These estimates 
are useful in comparing the relative unreliabilities of measurement for different 
' items, in successive censuses, or as between different methods of measurement, 
thus providing some appraisal of new methods of measurement. 
The interpretation of these comparisons is, of course, often clouded by doubts 
as to whether conditions A apply. More complex estimates based on more 
= realistic assumptions are given by Fellegi (1964). 


(13.60) 


13.15 INTERPENETRATING SUBSAMPLES 


This technique, particularly useful for the study of correlated errors, was 
proposed by Mahalanobis (1946). To present is in the simplest terms, a random 
sample of n units is divided at random into k subsamples, éach subsample 
containing m=n/k units. The field work and processing of the sample are 
planned so that there is no correlation between the errors of measurement of any 
two units in different subsamples. For instance, suppose that the correlation with 
which we have to deal arises solely from biases of the interviewers. If each of k 
interviewers is assigned to a different subsample and if there is no correlation 


between errors of measurement for different interviewers, we have an example of 
the technique. 


With the same mathematical model, it is convenient to label the units by double 
subscripts. Let 
Vija = H'j + je (13.61) 
where i denotes the subsample (interviewer) 
subsample. The fpc is ignored. 
Since the ith subsample is a random subsa; 
sample of size m. Hence, by (13.51), the varia 


and j the member within the 


mple, it is itself a simple random 
nce of its mean is 


Abe 
V(Yia) = in Se toll +(m=1)p,]} (13.62) 


where p,, is the correlation between the dija obtained by the same interviewer. 
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Since errors are independent in the different subsamples, 
1 
VO) EE Va) = HS, ofl + (m —1)py] (13.63) 


From the sample results we can compute an analysis of variance of the km 
observations into the components “Between interviewers (subsamples)” with 
(k—1) degrees of freedom and “Within interviewers” with k(m --1) degrees of 
freedom. It is easy to verify that the expected values of the mean squares, 


averaged over selections of the interviewers and of the random samples and 
subsamples, work out as in Table 13.12. 


TABLE 13.12 
EXPECTATIONS OF THE MEAN SQUARES (ON A SINGLE-UNIT BASIS) 


Between 


interviewers } 
i (subsamples) ri pagi z a = Ja) S toelt inip] 
Within 

interviewers 


k(m-1) s 2DE (Vja Jia)? 


2 ft A 
a km-i) Satoi (l= Pw) 


Table 13.12 contains two important results. By comparison with (13.63) we see 
that s,”/n is an unbiased estimate of V(¥..); ignoring the fpc. Thus interpenetrat- 
ing subsamples provide an estimate of V(¥,.) that takes proper account of both the 
simple response variance and the correlated component. 


The analysis also enables us to estimate the correlated component, since 


Elsi =s.) 
= p (13.64) 
Consequently, comparison of (m=-1) (s,” 


tive amount which the correlated component of the Tesponse variance contributes 
to the total variance of Ja- With measurements in which the correlated component 
is much larger than the simple response variance, the ratio (m —1)(s = s„?)/ ms}? 


- has been used alternatively as a measure of the relative contribution of the total 
response variance to the total variance of Ja 


l nce to i Tepping and Boland (1972) present 
estimates of this ratio for items in the Current Population Survey, 

When the interpenetration method is applied in a multistage sample covering a 
wide geographic area, the.most common practice is to have pairs of interviewers 


—5y")/m with sẹ? estimates the rela- 


ae 
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measure interpenetrating subsamples drawn from the smallest clusters among the 
successive stages. In this way the number of ultimate units in an interviewer's 
assignment is kept at its customary level in the survey, although the interviewer 
has to travel over twice the usual area. For a single cluster, Table 13.12 provides 
1 df between interviewers and 2(m — 1) df within interviewers: the corresponding 
mean squares are averaged over the c clusters chosen for the study. The sampling 
variance that is measured is, of course, only that within the last stage of clustering. 
As a reminder of this fact, the sampling variance term in E(ms) is sometimes 
written S, 0 —p,) instead of Siz where p, is the intracluster correlation among 
the sampling errors for different subunits. 

The interpenetration method was used in this form in Response Variance Study 
I by the U.S. Census Bureau (1968), designed to estimate the correlated compo- 
nents of the total response variances of items in the 1960 Census. The areas in this 
study were compact clusters of households, the clusters being scattered all over 
the U.S.A. In any sample cluster, two interpenetrating subsamples were formed, 
each subsample being assigned to a different interviewer, 

In half the clusters the two interviewers had different crew leaders. In this half it 
was assumed, as seems reasonable, that the response errors of the two interview- 
ers were uncorrelated. Thus (s,7—s,7)/m estimates Pwa , Where 5,”, is now the 
average mean square between interviewers in the same cluster. In the other half of 
the clusters the two interviewers had the same crew leader. The objective here 
was to measure the extent to which “crew leader effect” induced a covariance 


between d, and d for the two interviewers in a cluster. If so, Sp in Table 13.12 
now estimates 


Su(1—p.)+0(1 +(m — l)py]—mE cov (dida) 
and (s, —s„°)/m estimates 


Pua —E(cov dd.) (13.65) 


Comparison of the two sets of values of (s,°—5,,)/ m reveals the presence of a 
“crew leader effect.” Since differences between estimates of variance like s,” nad 


two halves of this type of study 
th any precision, 


> ) lon ified and multistage sampling. If 
the primary interest is in an unbiased estimate of V(j,) that takes proper account 


errors of measurement are independent in differe 
requires that different interviewing teams, supervi 
used in different subsamples. If J, is the mean of the ith subsample, the quantity 
Lie —Ja)’/k(k —1) is an unbiased estimate of V(¥..), with (k — 1) df. This result 
holds because the subsample can be regarded as a single complex sampling unit, 
the sample being in effect a simple random sample of these complex units, with 


nt subsamples. Strictly, this 
sors, and data processors be 
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uncorrelated errors of measurement between different complex units. Conse- 
quently, the results in section 13.10 apply. 

Numerous applications of this method, sometimes called replicated sampling, 
are described by Deming (1960), who has used the method extensively. For other 
discussions of its advantages, see Jones (1955) and Koop (1960). Travel costs of 
interviewers are increased by interpenetration, but this can be mitigated if the 
sample is stratified into compact areas. For instance, each stratum might contain 
two random samples, assigned to a different interviewer. Each interviewer is 
required to travel over the whole stratum instead of over only half the stratum. 
Every stratum provides 1 df for the estimate of V(¥,.). 


13.16 COMBINATION OF INTERPENETRATION 
AND REPEATED MEASUREMENT 


As we have seen (Section 13.14), repeated measurement of the units in an 
agent’s assignment by another agent of similar quality provides estimates of the 
simple response variance and the total response variance if conditions A apply, 
although it may underestimate in interview surveys if the respondent’s errors on 
the two occasions are positively correlated. The interpenetration scheme (section 
13.15) provides estimates of the correlated component of the response variance 
and its contribution to the total variance, response variance plus sampling 
variance. 

Much more may be learned from an ingenious combination of interpenetration 
and repetition, as used by Fellegi (1964) in a study of response errors in the 1961 
Canadian Census of Population. The study was conducted in 134 Enumeration 
Areas (E.A.’s), each containing about 150 households, the size of an interviewer’s 
assignment. Contiguous E. A.’s were grouped into 67 pairs. Two interviewers 
were assigned to each pair, each interviewing a random half of the households in 
the pair. Thus each enumerator had the regular work load, but spread over twice 
the area. Then the assignments of the two interviewers were switched, giving the 
desired combination of interpenetration and repeated measurement. 


If S,,S2 denote the two interpenetrating subsamples and /,, I>, the two 
‘nterviewers, comparison of (J,S;) with (12S;) or (1,S2) with (1,55) gives the 
“repeated measurement” analysis, while comparison of (I, S) with (1,S>) or (I,S>) 
with (J,S,) gives the “interpenetration” analysis. These comparisons lead to 
estimates of the simple response variance, the correlated component, the total 
response variance, and the index of inconsistency. The sampling variance involved 
is that between households within pairs of E.A.’s. More extensive analysis by 
Fellegi also estimates the covariance between sampling.and response deviations 
for the same interviewer. This term has been neglected in the model presented 
here, but Fellegi shows that it may create sizable biases in the estimates of pa,,’. 

A good exposition of the strengths and weaknesses of different variants of the 


repeated measurement and the interpenetration approaches has been given by 
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Bailar and Dalenius (1969). Hansen and Waksberg (1970) review the research 
work of the U.S. Census Bureau on measurement errors as they affect the Census 
and some of the most important sample surveys taken by the Census Buran: 
Resulting changes in the 1970 Census included more widespread use of self- 
enumeration (omitting interviewer biases) in a Census by mail, further use of 
sampling as distinct from a complete Census, and advance computer selection of 
the sample to avoid some biases that had been detected in selection of the 
final-stage sample by the interviewer. Disturbing errors found in data on occupa- 


tion, industry, and housing quality as well as recall problems in expenditure data 
are under continuing study. 


13.17 SENSITIVE QUESTIONS: RANDOMIZED RESPONSES 


A situation likely to lead either to refusals to answer or to evasive answers 
occurs when a question in a survey is sensitive or highly personal (e.g. does the 
respondent regularly engage in shoplifting or use drugs?). Consider first the 
estimation of a binomial proportion—the proportion ma of respondents who 
belong toa certain class A or have committed a certain act. By ingenious use of a 
randomizing device, Warner (1965) showed that it is possible to estimate this 
Proportion without the respondent revealing his or her personal status with 
respect to this question. The objective is to encourage truthful answers while fully 
preserving confidentiality. 

The randomizing device, such as a spinning arrow or a box with red and white 
balls, selects one of two statements or questions, each requiring a “yes” or “no” 
Tesponse, to be presented to the respondent. The interviewer does not know 
which question any respondent has answered, but does know the relative prob- 
abilities P and (1 — P) with which the two statements are presented. The success of 


the method depends, of course, on the respondent’s being convinced that by 


participating he or she will not be revealing personal status with regard to the 
sensitive issue. 


In Warner’s original Proposal the two statements are: 


“Iam a member of class A.” (presented with probability P) 
“I am not a member of class A.” 


With a random sample of n respondents the interviewer records a binomial 
estimate ¢ =m/n of the Proportion ¢ of “yes” answers. If the questions are 
answered truthfully, the relation between ¢ and 74 in the Population is 


$ = Pra +(1-P)(1 =ma)=(2P— 1ra 
With known P, this relation suggests the estimate 
-[ĝ-0-P)] 1 
w ~ @P=1) (p#2) (13.67) 


+(1-P) (13.66) 


a 
TA 


aa 


A 
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This estimate turns out to be the maximum likelihood estimate of 74. (The suffix 
W denotes ‘‘Warner”.) The estimate is unbiased, with variance 


hy = OP), 
Vaw) = pape (13.68) 
Writing (1—@) in the form 
(1-¢)=(2P-1)(1-7ra)+(1-P) (13.69) 
we find easily 
Vaan) TAT POR (13.70) 


n n(2P- 1) 


The first term in V(7aw) is the variance that V(i4) would have if all n 

respondents answered truthfully a direct question about class A membership. 
1 Š 

Except for m4 near 3 and P >0.85, the second term is greater than the first, often 
much greater. The method is thus quite imprecise in general. This might be 
expected, since the interviewer does not know whether a “yes” answer implies 
membership in class A or the opposite: As Warner showed, however, his method 
may give a smaller MSE than a direct sensitive question would, if the latter 
produced numerous refusals or false answers. 


13.18 THE UNRELATED SECOND QUESTION 


As an alternative to the Warner method, Simmons suggested (Horvitz, Shah, 
and Simmons, 1967) that respondent cooperation might improve if the second 
statement was not in any way sensitive, being unrelated to the first. For example, 


“I was born in the month of May.” 


The first statement remains unchanged. If all respond truthfully, the population 
proportion of “yes” answers is now 


$= Prat+(1-P)ry (13.71) 


where Ty is the Proportion in the sampled population who were born in May. If 
Tu is known, the obvious (and maximum likelihood) estimate of Ta iS 


fay = $0 Piel (13.72) 


with variance 


Vanu) = EA 


P (13.73) 
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Morton (Greenberg et al., 1969, p. 532) has suggested how the case, Tu known 
can always be achieved. A box contains red, white, and blue balls in known 
proportions P,, P2, P3. Drawing a red ball produces the sensitive statement. 
Drawing a white or a blue ball produces the statement: “The color of this ball is 
white.” Thus, Tu = P2/(P2+ P3). rs 

Dowling and Shachtman (1975) have shown that V(#,y)< Vittaw) for all 
Ta, Tu, provided that P exceeds about 4. (The variance. of faw is symmetrical 
about P=} but that of fay is not, a small P providing few responses on the 
sensitive question with this method.) 

' If it is necessary to estimate both m4 and Tuy we can have two random samples 
of sizes nı, n2, with different proportions P}, P, for the sensitive question. With 


¢1, $2 denoting the proportions of “Yes” answers in the populations defined by 
the choices P; and P}, 


$1=Py74+(1—P,) ty (13.74) 
$2= Pym, +(1—P3) ay (13.75) 
These relations suggest the estimate 


~ _(¢i1-P,)-é(1—P,)] 
UAU = ed 


13.76 
(P\— P2) 
with 
- yi 1 [¢i1-¢,)(1-P,)’ , 6.(1-¢,)(1-P,)? i 
Vaa) = ay i. + a ] (13.77) 


If P,>3. Greenberg et al. (1969) showed that this variance is minimized when 
P,=0, that is, when all in the second (n2) sample are asked the unrelated Ty 
question. Moors (1971) has recommended this procedure, but:Greenberg et al. 
(1969) suggest P,+P,=1 as a working rule, in case the choice P,=0 might 
weaken cooperation by respondents. When P, = 0.8, for example, 80% of sample 
1 and 20% of sample 2 would be asked the sensitive question on the average. 


With the Optimum n, n for givenn =n, +n, the Cauchy-Schwarz inequality 
shows that the resulting minimum variance of fau is 


1 
Van Thaw) = op pra PAA- PNA (13.78) 
The minimizing n,/n, ratio is 
m_U-P) Ja —¢;) 
n (1—P,) vaa (13.79) 


Ta and Tu, but the optimum is fairly 
dations about the choices of P4, P3, n4, 


This choice requires advance estimates of 
flat. Greenberg et al. (1969) give recommen 
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and n2. For preservation of confidentiality it helps to have Ty approximately 
equal to ma- 

Numerous variants of these methods have been studied (e.g., use of two 
unrelated questions), as well as the biases produced in zy and 74y if a fraction 
of the respondents answer the question falsely. Greenberg et al. (1971) have 
applied the two-sample technique to estimate the mean 24 for a sensitive discrete 
or continuous variable by methods analogous to those leading to equations 
(13.74) and (13.75). The unrelated question estimates the mean uy for a 
nonsensitive variable. Random subgroups of n; subjects (i= 1,2) receive the 
Sensitive question with probability P, the nonsensitive with probability (FB): 
The recorded variable z; for a subject therefore follows a mixture of two 
distributions in proportions P, (1—P;), one distribution with mean u4, variance 
@4°, the other with mean py, variance oy. Hence 


E(z;) = Piua +(1-P;)uu (13.80) 
V(zi)= Poa? + (1 Pio? +P; (1—P;) (ua =p)? (13.81) 
Analogous to (13.76), the estimate of u4 is 
a [(1=P3)Z,—(1—P,)zZ>] 
SS a 13.82) 
pau (P,P) í 


For maximum efficiency the conditions Hu = Ha, Oy =0 are required, while for 
preservation of anonymity, wy = Ha, ou =o, are best. . 

Warner (1971) has given a theoretical framework for a broad class of ran- 
domized response models. As (13.74) and (13.75) suggest, the trick is to estimate 
certain linear functions of the sensitive and unrelated m’s or #’s with as many 
equations as there are parameters to be estimated. 

The method has been applied to obtain estimates of the proportions of 
illegitimate births, of induced abortions, of users of heroin, of persons having 
contact with organized crime, and of mean income and number of abortions as 
continuous-discrete applications. Since the method has attracted widespread 
attention, further applications are likely to appear. An excellent review is given by 
Horvitz, Greenberg, and Abernathy (1975). 

There have recently been discussions of the degree of privacy that the respon- 
dents have in different versions of randomized interviewing. In some versions the 
interviewer may be able to guess the status of some respondents with regard to the 


sensitive issue with a fairly high probability of being correct—an undesirable 
feature for this method. 


13.19 SUMMARY 


In regard to their effects on the formulas given in preceding chapters, nonsam- 
pling errors may be classified as follows. 
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1. With noncoverage and nonresponse, the most important consequence is that < 
estimates may become biased, because the part of the population that is not 
reached may differ from the part that is sampled. There is now ample evidence 
that these biases vary considerably from item to item and from survey to survey, 
being sometimes negligible and sometimes large. A second consequence is, of 
course, that the variances of estimates are increased because the sample actually 
obtained is smaller than the target sample. This factor can be allowed for, at least 
approximately, in selecting the size of the target sample. 

2. Errors of measurement that are independent from unit to unit within the 
sample and average to zero over the whole population are properly taken into 
account in the usual formulas for computing the standard errors of the estimates, 
provided that fpc terms are negligible. Such errors decrease the precision of the 
estimates, and it is worthwhile to find out whether this decrease is serious, 

3. If errors of measurement on different units in the sample are correlated, the 
usual formulas for the standard errors are biased. The standard errors are likely to 
be too small, since the correlations are mostly positive in practice. This type of 
disturbance is easily overlooked and may often have passed unnoticed, 

4. A constant bias that affects all units alike is hardest of all to detect, No 
manipulations of the sample data will reveal this bias, 


EXERCISES 


13.1 Suppose that, by field methods of different inte: 
“response” stratum consist of 60, 80, 90, or 95% of the wi 
that is to be estimated, the true “response” str. 
stratum, 43.5; 90% stratum, 44.8; 95% stratu; 


Nsities, it is possible to make the 
hole population. For a percentage 
atum means are: 60% stratum, 40.7; 80% 
m, 45.4; last 5%, 59.0. (a) For a method that 
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saniples only the 60% stratum, show that the root mean square error of the estimated 
percentage for the whole population is 


v(2414/n)+28.94 


Where ñ is the number of completed questionnaires obtained. (b) Show that a root mean 
square error of 5% cannot be achieved by a method with 60% response but can be obtained 
with slightly over 100 completed questionnaires for the methods that have a response of 
80% or better. (c) If a root mean square error of 2% is prescribed, what methods can 
achieve it and what sample sizes are needed? 

13.2 In 13.1 (c) suppose that it costs $5 per completed questionnaire for the field 
method that has a 90% response. To obtain a completed questionnaire from the next 5% of 
the population costs $20. For a root mean square error of 2%, is it cheaper to use the 
method with 90% response rate or that with 95% response rate? 

13.3 A population consists of two strata of equal sizes. The probability of finding the 
respondent at home and willing to be interviewed at any call is 0.9 for persons in stratum 1 
and 0.4 for persons in stratum 2. (a) In the notation of section 13.5 show that 


wna =1- (0.1), w= 1- (0.6)' 


{b) If the original sample size is n,, compute the total`expected number of interviews 
obtained for 1, 2, 3, 4 and 5 calls. (c) If the relative costs per completed interview at the ith 
call are 100, 120, 150, 200, and 300 for i = 1, 2,3, 4, 5, respectively, compute the average 
cost per interview for all interviews obtained up to the ith call. (d) The money available for 
the survey is enough to pay for 300 completed first calls. If the policy is to insist on i calls, 
what are the expected total numbers of completed interviews that can be obtained for the 
same amount of money when i = 1, 2, 3, 4, 5? 

13.4 In exercise 13.3 persons in stratum 1 have a mean of 40% for some binomial 
percentage that is being estimated and persons in stratum 2 have a mean of 60%. (a) 
Compute the bias in the sample mean for i = 1, 2,3, 4, 5 calls. (b) Compute the variances of 
the sample means for the cost situation in part (d) of exercise 13.3. (To save computing, the 
variance may be taken as 2600/n,, where n, is the expected total number of interviews 
obtained.) (c) Which policy gives the lowest MSE? 


13.5 In section 13.6 (subsampling of the nonrespondents) verify the formula (p. 373) 
for the ratio of the expected cost of obtaining a specified V with no subsampling to the 


minimum expected cost, 
F(cot+c¢,W,+c,W2) 
IVF- W,)(co+e, W,) + Waves? 
where F=S?/S,*. Let co=c,=1, c.= 16. (a) If F=1, show that the cost ratio has’ a 


maximum 1.25 when W, =0.2 or 0.25. (b) If F = 1.5, show that the i i 
PE 2 (b) a maximum is 1.41 for 


Ratio = 


13.6 In a survey on poultry and pigs kept in gardens and certain small holdings (Gray, 
1957) a postal inquiry with several reminders was followed by interviews of a subsample of 
nonrespondents. By advance judgment, k =2 was chosen (i.e., a 50% subsample). The 


following data were available after the survey for one important item, in the notation of 
exercise 13.5. 


Oe 0 fi 3 
TAOL ===9.5, W,=0.8, SEs? 
Co n 
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By finding VC for k =2 and for the optimum k, determine whether k = 2 was a good 
choice. : sh 

13.7 In a survey by the Politz-Simmons method 390 respondents in an initial sample of 
660 were found at home on the first call. The numbers who stated that they were at home on 


0, 1, +++, 5 of the five previous nights and the number answering yes to a question in the 
survey were as follows. 


0/5 1/5 2/5 3/5 4/5 5/5 


Number Sie soe 55s a * 18 
Yes answers ae 13% 490 BOWM4D 156 


Compute the Politz-Simmons estimate of the Proportion of “yes” answers in the 
population and compare it with the simple binomial estimate, 

13.8 A population with N=6 contains 
question is yes and three for which it is no. 
of obtaining a “yes” response on a yes un 
response on a no unit is also 0.9. (a) 
responses for samples of size 2, show that 
the sample gives 0, 1, 2 “yes” 


three units for which the correct answer to a 
Owing to errors of measurement, the probability 
it is 0.9. and the probability.of obtaining a “no” 
By working out the distribution of all possible 
the probabilities are 0,21 8, 0.564, and 0,218 that 
responses. (b) Show that the variance of the estimated 
Proportion of “yes” responses is 0.1090. Verify results (13.40) and (13.41) in section 
13.10. (c) What would be the variance of the estimated proportion of “yes” responses if 
there were no errors of measurement? 

13.9 In part of the 1942 Bengal Labour Enquiry (Mahalanobis, 1946) a random 
sample of about 175 families was taken in each of three strata, The sample in each stratum 
was divided into five random subsamples, each assigned to a different interviewer. The five 


interviewers worked in all three strata. For expenditure on food, the relevant part of the 
analysis of variance (on a single-family basis) is as follows, 


df ms E(ms) 
Between interviewers 4 


22.3 o° toy +350, +1050? 


Interviews x strata 8 9.6 T, +0 +350 
Within subsamples 510 9.9 o,7+0/ 
If Sy Wi 


represent biases of interviewer i, the model for a single family is 
Yne = Hy + 8 + Whi PO ha ~A) + drija 
Variances: TF Os Oo? oe 


2 

Verify the expressions given for E(ms) and estimate the Proportion of the total variance 
of the mean that may be ascribed to enumerator biases, 

13.10 Consider an illegal act that 10% of the Population have committed (ma =0.1), If 
all respondents answer truthfully, compare the Vlha) for n =500 given by (a) a direct 
sensitive question, (b) the Warner method with P = 0.8, (c) the unrelated question method 
with m, = 0.2 known, (d) the two-sample unrelated question method with 7, actually 0.2 
but unknown, when P, = 0.8, P; = (as recommended by Moors, (e) the same method when 
P,=0.8. P)=1-P,. Assume that you can use the optimal n,/n, in (d) and (e). Some 
decimals are avoided by calculating V(1004,) = 10° V(%,) for the methods. 
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13.11 In exercise 13.10, suppose that all respondents answer truthfully with any of the 
randomized response methods (b), (c), (4), or (e), but that under a single direct sensitive 
question, method (a), some respondents who have committed the act deny this. For 
n = 500, which of the randomized response methods give a smaller MSE(z,) than method 
(a) if (i) 15%, (ii) 20%, (iii) 25% of those who have committed the act deny it under 
method (a)? 
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Answers to Exercises 


1.1 (a) Examples of problems of definition are decisions whether to count words in a preface or 
index and how mathematical symbols are treated as “words.” In a book such as this one with many 
mathematical symbols, however, it seems unlikely that a count of words, either including or omitting 
symbols, would be wanted. (b) (1) The pages constitute a convenient frame. A disadvantage of the 
Page as a sampling unit in which we count all words on any sample page is that with numerous 
illustrations the number of words per page may be quite variable because of the incomplete pages. It 
may be worthwhile first to list all the incomplete pages, forming two subpopulations or strata, one of 
incomplete pages and one of complete pages, that are sampled separately by the method of stratified 
sampling described in Chapter 5. (2) A problem with the line is that obtaining a listing of lines so that 
lines can be sampled directly is time-consuming. Also, there are incomplete lines at the end of 
paragraphs. Since words per line should be fairly stable, however, the solution may be to use two-stage 
sampling (Chapter 11), first drawing a sample of pages and then counting the number of lines on each 
selected page and drawing a subsample of lines on these pages. 

1.2 This question supposes that we first draw a sample of cards with equal probability. (a) If 
sample names not in the target population are discarded, the only problem is that the size of the sample 
of names from the target population will generally be less than the number of cards and will be a 
random variable, depending on the cards that happen to be chosen. (b) The problem is that names 
appearing on several cards have higher probabilities of selection. One way of handling this is to count 
the number of cards on which a selected name appears and use this number in making the estimate by 
methods appropriate to selection with unequal probabilities (Chapter 9A). Another way that gives 
each name an equal chance but may involve many rejections is to retain a card only if it is the first of the 
Set on which this name appears. (c) As in (b), names are being selected with unequal probabilities. I 
know of no easy method of giving each name an equal chance, If the number of cards on which each 
name appears has been recorded somewhere, an unequal-probability method can be used, as in (b). 

1.3 Suggestions are: (a) a recent directory of department and luggage stores, (b) the repositories 
for lost articles maintained by the subway and bus companies, and (c) hospitals and private physicians 
in the geographical area in which snake bites occur, plus any public health organization to which 
reporting of bites is compulsory. Weaknesses in all three frames are likely to be incompleteness, plus 
high costs in (c) if snake bites are rare and not centrally reported. (d) A list of households is often used 
as a frame for selecting a sample of families. Although there will be some incompleteness (families who 
cannot be reached), the major problem may be errors in measurement, 

1.4 A problem is incompleteness because of new construction. In a sample of addresses, new 
dwellings can usually be handled by the interviewer, who checks, for any sample address, whether 
there are new dwellings between this address and the next address in the directory and, if so, includes 
these new dwellings in the sample. Whole areas of new construction may not be mentioned in the 
directory and require development of a separate frame. Drawing a list of addresses is preferable to 
drawing a list of persons, since addresses are more permanent, For reasons of travel expense, however, 
the sampling unit may be a city block from which a subsample of dwellings is selected. 

1.5 $80,390 and $82,970. 


1.6 The confidence probability is about 0.054 (found from r= —1.67 with 25 degrees of 
freedom). This assumes that future receipts follow the same frequency distribution as the sample of 26 
receipts, and that this distribution is normal. 
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1.7 When the MSE is due entirely to bias, the estimate is always wrong by 1VMSE. The 
probability of an error =1VMSE is therefore unity and the probability of an error =1.96VMSE or 
=2.576V MSE is zero. 


2.4 Y=51,473. Probability about 0.9. 

2.5 Yes, o(¥) is 98.4. 

2.6 Y=20,238, s(¥)=849. 

2.7 (a) Public: Å = 15.46. Private: R = 12.75. (b) Public: s(R) = 0.761. Private: s(R) = 0.727. 
For the fpc we take f = 100/468. (c) 14.2<R<16.7. 

2.8 Diff./s.e.qin = 2-71/1.186 = 2.28. P about 0.023. Note that the fpc is not used in computing 
S.C. itt 

2.9 (a) 9408, s.e. = 780; (b) 9472, s.e. = 1104. 

2.10 S.c. (in 1000's) =(a) 14,800; (6) 3900; (c) 3140. 

2.11 9.2. (a) 2.7; (b) 2.4. 

2.12 (a) n= 60, with 30 from each domain; (b) n = 80 will do if the number of owners in the 
sample lies anywhere between 20 and 60. With n = 80, the probability that this happens is only about 
0.54 (from the binomial tables). With n = 100, any sample number of owners between 19 and 81 will 
suffice, the probability that this happens being about 0.94. 


2.14 (a) 420; (b) 490; (c) both are unbiased; (d) estimate (b). 


3.2 1066, 1334 as given by the normal approximation, equation 3.19. 
3.3 Nearly conclusive, 
3.6 (a) 76.2£3.6%; (b) 1738+ 280 families. 
3.7 1789+ 268 families. 
3.8 As an exact result, 
vÂ) _ nQ, 
on 
VIA, Neny(l=7)(Q,+ Pim) 


Now N, = N- 7), and in large samples n, = n(1— m). These substitutions give the stated result, In 
order that V(A,) V(A,’) be small, we must have 7(1—Q,)/Q, large. This means that Q, must be 
small: in other words, the proportion of domain 1 that lies in class C must be large. For given Q}, m 
should be large. 

3.9 All give Ay = 13. By the hypergeometric, the probability of no units in C in the sample is 
0.0601 for Ay = 12 and 0.0434 for Ay = 13. By the binomial, Py = 0.4507 and V1- fPy = 0.4114, 
giving Ay = 12.3. Page 59, Ex. 3 gives 0.061 for Ay = 12 and 0.044 for Ay = 13. 

3.11 Estimate (b) seems more precise. 

3.12 The highest value is PQ/n as compared with PQ/mn by the binomial formula. This occurs 
when every cluster consists entirely of 1's or entirely of 0’s. The lowest value can be zero if every cluster 
gives the same proportion P. (This is possible only for certain values of P and m). 

3.13 Variance is 0.00184 by the ratio method and 0.00160 by the binomial formula. 

3.14 Average size of sample = m/P. 


4.1 735 houses. This sample size is needed for two-car households if P= 10%. 
4.2 About 260 sheets, 

4.3 (a) 2475; (b) 4950. 

4.4 n=21 (taking t= 2). 

4.5 n= 484. For number of unemployed, the cv would be about 15%, 
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4.6 62 more. 
4.7 (a) n=278; (b) n=2315; (c) n=3046. 
4.8 Ifarectangular distribution is assumed within each class, we take S? = 0.083h7 or S = 0.29h. 


This gives estimates of 230, 580, 2030, and 11,600 in the four classes. If the right-triangular 
distribution is used in the fourth class, we take $= 0.24h, giving 9600 for this class. 


UNS \2/3 
4.11 n, =| =) 
GENNA 


4.12 (a) n= 1250; (b) n = 679. In this part, we can give a dummy variate y; the value +100 for a 
(Yes, No) answer, —100 for a (No, Yes) answer, and 0 otherwise. Then Ÿ = P, ~ P, in percentages. 
With an advance sample of n; = 200, formula (4.7) in Section 4.7 can be used, giving n = 679. 


4.13 (a) The probability that a family of four persons has 1,2, and 3 females is approximately 1/4, 
1/2, and 1/4, respectively, For estimating the proportion P of females, a simple random sample of n 
families gives V(P) = 0.03125/n, as against 0.0625/n for a sample of 4n persons. The deff factor is 
about 1/2. In the corresponding example in Table 3.5 with 30 households of unequal sizes, the 
estimated deff factor was 0.475. (b) The deff factor would be slightly raised by families with identical 
twins, since the proportions of families with 1 and 3 females would be slightly increased. 


5.1 (a) Neyman allocation gives nı =0.87, n2=3.13. (b) There are three possible estimates 
under optimum allocation and nine under proportional allocation. Vop:(¥s:) =% = 0.167; Vorop (Yur) = 
73 = 0.583. (d) Formula 5.27 gives Von, (Fs) = 0.159. 

5.2 (a)n, =375, nz = 625; (b) nı = 750, nz = 250. 

5.3 RP=181% for proportional allocation and 214% for optimum allocation. 

5.5 When W, = W, the relative increases equal 0.029 for €2/c;=2 and 0.111 for €2/c,=4. 7 

5.6 (a)n,/n=4,n2/n =}, (b) n= 264, n, = 88, n2= 176; (c) $1936. 

5.7 (a) $2288 against $1936. (b) No. The minimum field cost to reduce V to 1 is $2230. 

5.8 (a) ny = 384, n, = 192; (b) nı = 400, nz = 1600; (c) n, = 1200, n2= 2400. 

5.9 Fractional increase = 4, 

5.10 n, =541, nz = 313, ny = 146, 


5.12 In population 1, Vprop = 0.143/n; Vp, =0.134/n. In population 2, Vprop = 0.0491/n, Vap = 


0.0423/n. The reduction in variance from optimum allocation is about 6% in population 1 as against 
14% in population 2. 

5.14 (a) If we guess P, =45%, P,;=25%, P3=7.5% as a compromise, this gives ny, = 268, 
nz = 116, n3= 16; (b) s.e.= 0.0225; (c) s.e.=0.0241. 

5.15 As n approaches N, a stage is reached in which the standard formula n, € N,S, for Neyman 
optimum allocation is no longer applicable, since it would require n, > Nj, in at least one stratum. As 
noted in Section 5.8, formula (5.27) then ceases to hold, The student isin error if he claims that (5.27) is 


always wrong; the formula has a limited range which, however, covers nearly all applications. 


5A.4 No. In each of the worst cases [E(w, — Wha) Yn]? is (0.105)? = 0.0110. Thus, with stratifica- 
tion, MSE(J,.), as given by formula (5A.6), is 0.0108+0.0110=0.0218. With simple random 
sampling, V(¥)= 0.0177. 

5A.6 (a) n=1024. The optimum allocation for the se 
satisfies both requirements. 

5A.7 W,=0.728, W2=0.272, 5,=1.806, S,= 4.698 (in the coded scale). (a) The optimum 
sample sizes are n; = 0.507n, nz =0.493n. (b) V(F)=31.95/n, Vops(Fer) = 6.72/11. 


a a t 
5A.8 wf AD dy =f 2(1-y) de = 2V2[1 — (1 ~ a)/4/3, Hence we want [1- (1-4) = 
o 0 


‘cond variate (average amount invested) 


5A.9 In Exercise 5A.7, W, = 0.728, as in the Dalenius—Hodges rule, comes closest to satisfying 
/ the Ekman rule. In Exercise 5A.8, the Ekman rule gives a = (3~/5)/2=0.38. 
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5A.10 The optima are L =7 for p=0.95, L = 5 for p = 0.9, and L = 4 for p = 0.8. Either L =5 or 
L =6 is a good compromise. 

5A.11 (a) Gain in precision is about 110%. (6) Gain from proportional stratification over simple 
random sampling is about 90% 

5A.12 Increase n, as the hint suggests, leave n = 400. We require n, = 140, giving n = 540. 


6.1 For the ratio estimate V(¥pg)=N7(1-f)Sa°/n and for simple expansion V( y= 
NA(1- As, ?/n, where d = (y — Rex), For the > sample of 21 households the estimates of Sir and SA areas 
follows. Number of childrep, sa = 0.49, s, = 1.61; number of cars, sa = 0.41, s? = 0:39; number of 
TV sets, sa =0.51, sy = (0.45. The ratio estimate appears superior. for children, 


6.2 Gain=66%. At leat 11 units by the ratio method. 

6.3 Quadratic limits (27,100, 29,870); normal limits (27,030, 29,700). 

6.4 Apply theorem 6.3 to the estimation of R = Y/X. With large samples, use 9/X if r = (cv of 
x)/2(cv of y), and use ¥/¥ otherwise, where r and the cv's are sample estimates. 


6.5 The MSE’sare 46,5 for the separate ratio estimate and 40.6 for the combined ratio estimate. 
In both cases the contribution of bias to the MSE is negligible. 


6.6 For Lahiri’s method, V(Yp.)=40.1. 

6.7 Estimated population total = 116.21 millions. The relative variance is 0.00111, so that the 
s.e. is (0.0333)(116.21) = 3.87 millions. The estimate is within 1 s.e. of the true total. 

6.8 The estimates are (a) 1896, (b) 1660, (c) 1689. In (c) we find w,=2.38, w =-—1.38, 
Estimated s.e.’s are (a) 256, (b) 36.9, (c) 18.6. For the s.e. in (6) I used the formula s.e. = 
Ëk VO -Alyy Fe11—261)/n, where Sa is the ratio estimate of Y, that is, 1660. For the s.e. in (c) I 
used Fare = 1689. 


7.1 Estimate = 11,080; s.e. = 152 (including the fpc). 

7.2 No, since b is very close to 1. 

7.3 Y;,=28,177+570. The relative precision is 113%. 

7.4 27,751+694. 

7.6 For the difference estimate, V(y)=S./n, for the linear regression estimate, VF) = 
S283 /n(S2+ Sy 2). The regression estimate has the smaller variance, but its superiority is unimportant 
if 7/8, is Euall, 

7.7 MSE(¥;,;)= 34.5, Bias? = 9.7; MSE( Yj.) = 11.9, Bias? = 1.2. 

7.9 MSE(Yp;)= 8.9, MSE(¥p-) = 6.7. 


8.1 Variances are 8.19 (systematic), 11.27 (simple random), 8.25 (stratified, 2), 7.46 (strati- 
fied, 1). 


8.2 V.,,=0.00141, Vian = 0.00340. 

8.3 The systematic sample should be superior for the proportion of people of Polish descent, 
since this variable exhibits geographical stratification. It is likely to be infetior for proportion of 
children because the sampling interval, 1 in 5, coincides with the average size of a household. The same 
is true, though to a smaller extent, for Proportion of males. 

8.4 The variances are as follows. Males, V,,, = 0.0204, V,,, = 0.0216; children, V,,, = 0.0204, 
Vsys = 0.0776; professional, V, = 0.0192, Vys = 0.0016. 

8.5 Actual variance = 8.19, Method (a) gives 11.29. For method (b) the estimated variance from 


a single sample is (1—f\(¥i1— ¥i2)°/4, where ¥,1, Fiz are the means of the two halves. The average is 
3.24. The serious underestimation is unexpected. 


8.7 Both variances are (k?>—1)/6. 
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8.8 Simple random sampling is better unless n = 1 or k = 1. 
8.9 Every kth unit, V(¥)=362.2; Yates, MSE(Y)= 7.3; Sethi, V(¥)=21.0; Singh et al., 
V(Y)=81.0. The last two estimators are unbiased in this example. 


9.1 Relative costs of using the four types of unit are as 100, 90.1, 79.7, and 77.8 (taking the first 
unit as a standard). 


9.2 Relative precision of the household is 211% for the sex ratio and 38% for the proportion w10 
had seen a doctor. 


9.3 Relative precision of the large unit is 0.566 with simple random sampling and 0.625 with 
stratified random sampling. 


9.5 (a)M=S;(b)M=1. 
9.6 The optimum M should decrease because travel cost, which varies as vn. becomes relatively 
less important when n increases. 
9A.1 (a) 34,242; (b) 5534; (c) 6493. 
9A.3 (a) If the s.d. among large units in class h oc M,,. (b) If probability V Mp. 
„9A.4 (b) V(Yur)= 1.75, V( Êu) = 0.27, VY ruc) = 0.33. In this problem y,/z, varies little, but 
Ysrr performs relatively poorly because with the method of sample selection m; #2z,„ Murthy's 


estimator is devised for this method of sample selection having V(x) =0 if y,/z) is constant, as 
(9A.60) shows. (c).V(¥n4)/V( Yop: ) = 0.54, while (N=n)/(N-1) =}. 


9A.5 MSE(Y7ir.4)) = 7.06, (Bias) /MSE = 0.065, V(Ysps)=6.5. 


10.1 (a) 2.00; (b) 2.13. 

10.3 (a) 165/n; (b) 148.5/n; (c) 132/n. 

10.4 (a) n = 660 fields; (b) n = 530 fields. Protein requires fewer fields than yield. 
10.5 ¢,/c,=8. 

10.7 (a) 0.93%; (b) 0.51%; (c) 0.36%. 


10.8 (a) Either mo =7 or mo=8 is suitable; (b) 89% for mo = 7 and 93% for my = 8; (c) 86% for 
mo=7 and 89% for mo =8. 


11.2 The relative precision of III to II drops from 3.02 to 2.75. If two sampling plans differ 
primarily in their between-units contribution to the variance, the relative precision of the superior plan 
will in general decrease as the ratio of the within-units variance to the total variance increases. 


11.3 The explanation is, roughly speaking. that with these data the Y,/z, are more stable than the 
¥/M,. If we took z; = 35, 35, and 33, the between-units contribution in method IV would vanish. 
11.4 Total variance: 0.00504 (Ia), 0.02358 (II), 0.00554 (III), 


11.6 Estimated percentage 14.2+2.16. Estimated number 3540 +540. 

11.7 Estimated percentage 13,942.49. 

11.9 Total rooms, 29,400, total persons, 50,550, persons per room, 1.72; s.e.'s: total persons, 
2,440, persons per room 0.066. 


12.1 n= 267, n'= 1320 or n = 268, n'= 1280. V(p,,) with optimum allocation is 6.67 when Pst is 
in %'s. With single sampling, V(p) = 8.33. 

12.2 ¢p/Cn'>9. i] 

12.3 n/n'=1/19. 

12.4 n'>16n. 

12.5 By formula (12.67), s.e. = 1.25, ignoring 1/N, : 

12.6 Percent gains from the second to the sixth occasion are 50, 75, 91, 100, and 105, respectively. 


— 
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12.8 The values of nV S? and nV(¥2')/S? are a follows: « = 4, p = 0.8: 0.885, 0.875; u =}; 
p =0.9: 0.843, 0.840; p =}, p =0.8: 0.824, 0.810; 2 =4, p = 0.9: 0.752, 0.746. 

12.12 (a) Let y; = 1 for any unit that has the first attribute and y; = 0 otherwise and let stratum 1 be 
the stratum in which every unit has the second attribute. In the notation of theorems 12.1, 12.2, with 
1/N negligible, S? = P,Q, and S,? = P,P2/(P; + P2)*. Results in (a) follow from the theorems; (b) if 
C* is the expected cost, double sampling with v,= 1/2, k=2 (the optimum) gives V(Ŷ,.)= 
N?(0.844)/C*, while a simple random sample gives V(¥) = N7(1.875)/C*, over twice as large. 


13.1 (c) 90% response with 1047 completed questionnaires or 95% response with 701 completed 
questionnaires. 

13.2 The method with 90% response costs $5235. That with 95% response costs $5.7895 per 
completed questionnaire, or $4058 total cost. 

13.3 (b) 0.65no, 0.81579, 0.8915, 0.9351 no, 0.961170; (c) 100, 104, 108, 112, 117; (d) 300, 
288, 277, 267, 256. 

13.4 (a) Bias (in %) = —3.85, —2.15, —1.21, —0.69, —0.40; (b) variances are 8.67, 9.03, 9.39, 9.74, 
10.16; (c) four calls. r! 

13.6 Yes. VC for k =2 is only about 2% over the minimum VC, 

13.7 Politz-Simmors estimate, 39.7%; binomial, 42.3%. 

13.8 (c) Variance = 0.1. 

13.9 If each enumerator’s error of CHENG wie independent from family 1 to family, the 
variance of the sample mean would be (op? +o +o +r. 3)/525 instead of (o, ltor +350 + 
1050;°)/525. Enumerator biases contribute about 55% of the total variance. 

13.10 10° V(#4)= V(100%,4) =(a) 1.80 (Direct); (b) 10.69 (Warner); (c) 3.30 (7, known); (d) 
5.12 (Moors); (e) 6.30 (P2= 1—P,). 

13.11 (a) 10*MSE(#,) = 3.81; Fav is superior if my is known. (b) 10*MSE(#4) = 5. 47; the 
two-question 7x Is also superior if P3 = 0. (c) 10*MSE(#,4)= 7,64. All methods are superior except 
Warner's original method. 
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Proportions, estimation of, 50 
in cluster sampling, 64-68, 246 
in double sampling, 330 
effect of nonresponse, 361 
effect of population P on precision, 53 
with more than two classes, 60, 61 
in simple random sampling, 50-68 
size of sample for, 74 
in stratified random sampling, 107-111 
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accuracy of large-sample variance, 197 
bias, 198 
compared with mean per unit, 195 
compared with ratio estimate, 195 
conditions under which unbiased, 199 
in double sampling, 338 
effect of error in slope, 192 
estimated variance, 195 
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uses, 189 
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variance in large sainples, 194 
see also Combined regression estimate; 
Separate regression estimate 
Reinterviews in study of errors of measurement, 
385, 386 
Relative precision, 99 
method of calculating, 103 
Repeated measurements, 386, 391 
Repeated sampling of population, 344 
composite estimate, 354 
current estimates, 346-355 
estimates of change, 345, 352 
optimum percent matched, 347, 350 
rotation policies, 354 
use of nonrespondents from previous surveys, 
377 
Response deviation, 379 
Response variance, 379 
correlated component, 383 
simple, 383 
total, 383 


Sampled population, 5 
Sampling fraction, 21 
first-stage, 277 
second-stage, 277 
Sampling unit, definition, 6 
Sampling with replacement (WR), 18, 29 
Sampling without replacement (WOR), 18 
Satterthwaite’s approximation to number of df, 96 
Self-weighting estimate, 91 
in two-stage sampling, 303, 304, 307, 317 
Sensitive questions, 392 
randomized response method, 392-395 
Separate ratio estimate, 164 
compared with combined ratio estimate, 167 
estimated variance, 167 
liability to bias, 165 
optimum allocation, 172 
variance, 164 
Separate regression estimate, 201 i ; 
compared with combined regression estimate, 
203 
estimated variance, 202 
liability to bias, 202 
variance, 202 A 
Short-cuts, in computation of variance of ratio 
estimate, 173 
Simple expansion, 169 
Simple random sampling, 18 
for classification into more than two classes, 60 
confidence limits: 


aa 


for sample mean, 27 
for sample proportion, 57 
distribution of sample proportion, 55-57 
estimated variance, 202 
of sample mean, 26 
of sample proportion, 52 
method of drawing, 19 
+ optimum linear estimator of population mean, 
44 
sample size: 
needed for means, 77 
sample size needed for proportions, 75 
Simple response variance, 383 
Size of sample for specified limits of error: 
analysis of problem, 73 
for comparisons between domains, 83 
with continuous data, 77 
Cox's method of two-step sampling, 78-80 
for means over domains, 82, 
by_minimizing cost plus loss due to errors, 83 
with more than one item, 81 
for normal approximation to confidence limi: 
for continuous data, 41-44 
for normal approximation to confidence limits of 
Proportions, 5R 
with proportions, 75 
with rare items, 76 
in stratified random sampling, 105, 110 
Skewness: 
coefficient of, 42, 197 
effect of stratification on, 44 
effect on confidence limits, 41 
Square grid sample, 227 
Standard deviation in finite population, 23 
Standard error: 
of difference between domain means, 39 
of difference between two ratios, 180 
of domain mean, 33-34 
in stratified sampling, 145-147 
of domain total, 35 
in stratified sampling, 143 
of estimated population total from simple 
random sample, 24 
of mean: 
in cluster sampling, 240 
of simple random sample, 24, 25-27 
in stratified sampling, 91-98 
of systematic sample, 207-212 
in three-stage sampling, 286-287 
of proportion: 
in cluster sampling, 64-68 
in simple random sampling, 51-55 
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in stratified sampling, 107-109 
in two-stage sampling, 279 
over a domain, 62-63 
of ratio estimate, 31-32, 153-156 
in stratified sampling, 164-167 
of ratio in two-stage sampling, 311 
of regression estimate, 190, 194, 195 
of regression in stratified sampling, 200-203 
of sample standard deviation, 43 
of total in population possessing some attribute, 
52 
Standard error (approximate) of nonlinear 
estimators, 318 
balanced repeated replications, 320 
comparison of methods, 322-324 
jackknife method, 321 
Taylor series method, 319 
Steps in a sample survey, 4 
Strata, 89 
construction, 127-131 
optimum number, 132-134 
Stratification, 89 
best variable for, 101 
effect of number of strata on precision, 132 
effect on normality of variate, 44 
two-way, 124 
Stratified random sampling, 89 
compared with simple random sampling, 
99-101 
compared with ratio estimate, 169 
compared with systematic sampling, 209-223 
construction of strata, 127-131 
estimate pst, 107-108 
estimate Fsi, 91-92 
estimate of gain in precision; 136 
estimated variance of fp, 95 
estimated variance of pst, 108 
estimates for domains of study, 142-144 
Neyman allocation, 98-99 
with one unit per stratum, 138 
optimum allocation for fixed cost, 96-98 
with ratio estimates, 164 
with regression estimates, 200 
size of sampls, 105, 110 
type of population giving large gains, 101 
Subpopulations, see Domains of study 
Subunits (elements), 233 
Superpopulation, 158 
Systematic sampling, 205 
in autocorrelated populations, 219 
compared with simple random sampling, 
208-221 


compared with stratified sampling, 209-223 
effect of periodic variation, 217 
end corrections, 216 
estimation of variance, 223-226 
method of drawing, 206 
in natural populations, 221 
in populations: 

in *trandom”’ order, 212 

with linear trend, 214-217 
in single-stage cluster sampling, 265 
stratified systematic sampling, 226 
in two dimensions, 227 
in two-stage sampling, 279 
recommendations about use, 229 
relation to cluster sampling, 207 
variance of mean, 207-212 


Target population, 5 
Taylor series method, 319 
Theory, in sample surveys, 8 
Three-stage sampling, 285 
optimum sampling and subsampling fractions, 
288 
variance of mean per third-stage unit, 286-287 
Total response variance, 383 
Travel costs, formula, 96 
Two-phase sampling, see Double sampling 
Two-stage sampling (units of equal size): 
advantage, 274 
optimum sampling and subsampling fractions, 
280 
stratified sampling of the primary units, 
288-289 
table for selecting optimum size of subsample, 
283 
use of pilot survey, 283 
variance: 
of estimated mean, 276 
of estimated proportions, 279 
Two-stage sampling (units of unequal size), 292 
comparison of methods, 310 
methods with one unit per stratum, 293-299 
optimum sampling and subsampling fractions, 
313-316 
ratio estimators, 311 
units chosen: 
with equal probabilities, 293-295, 303-305 
with unequal probabilities WOR, 308-310 
with unequal probabilities WR, 306-308 


Unaligned systematic sample, 228 
Unbiased estimate, definition, 11 
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Unit (sampling unit), definition, 6 
United States Census, use of sampling, 2-3 
Uses of sample surveys, 2-4 

by business and industry, 3 

in decennial censuses, 2-3 

in market research, 3 

by members of United Nations, 2 


in opinion polls, 4 


Variance, definition of S* and g?, 22, 25 
Variance function, 243 

Variance of population, advance estimates, 78-81 
Variance of sample estimates, see Standard error 
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